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Glossary  of  Terms 


1.  Butterfly:  The  DFT  computation  of  Figure  3.4  pro¬ 
vides  the  notation  whose  appearance  is  that  of  a 
"butterfly" . 

2.  Fixed  Radix:  The  term  "radix"  is  commonly  used  to 
describe  a  specific  FFT  decomposition.  The  term 
"fixed"  radix  means  that  all  the  factors  of  N  are 
the  same. 

3.  Mixed  Radix:  All  the  factors  of  N  are  not  identical. 

4.  Relatively  Prime:  The  numbers  in  a  given  set  are  said 
to  be  relatively  prime  when  no  number  in  the  set  is 
divisible  (with  no  remainder)  by  any  other  number  in 
the  set.  Example,  (2,  3,  7,  9)  are  not  relatively 
prime  sets  because  9  is  divisible  (with  no  remainder) 
by  3.  The  following  example  is  relatively  prime: 

(2,  3,  5,  7). 

5.  Square  and  Square -free  Factors:  For  the  case  where 
N  =  4  •  3  •  7  •  4,  the  "4s"  are  square  factors  and 
the  3  and  7  are  square-free. 

6.  Twiddle  Factors:  The  term  refers  to  the  complex 
multipliers  of  Figure  3.8  which  pre-multiply  the  FFT 
butterflies.  They  are  sometimes  called  phase  or 


rotation  factors. 


Abstract 


A  comprehensive  comparison  of  the  most  efficient 
Discrete  Fourier  Transform  (DFT)  techniques  is  presented. 
The  DFT  algorithms  selected  are  the  fixed  radix  Fast 
Fourier  Transform  (FFT) ,  mixed  radix  FFT,  the  Winograd 
Fourier  Transform  Algorithm  (WFTA) ,  and  the  Prime  Factor 
Algorithm  (PFA) .  Comparison  of  the  algorithms  is  based 
on  the  number  of  real  multiplications,  additions,  and 
memory  arrays  required  as  a  function  of  sequence  length  N. 
This  paper  reviews  the  literature,  selects  the  most 
efficient  DFT  FORTRAN  programs  available,  develops  the 
number  of  real  multiplications  and  additions  as  a  function 
of  N,  and  compares  the  algorithms  using  tables  and  plots  of 
real  multiplications,  additions,  and  memory  arrays.  This 
comparison  shows  that  the  WFTA  and  PFA  require  the  least 
real  multiplications  and  additions,  but  the  fixed  radix 
and  mixed  radix  FFTs  require  the  least  memory.  The  mixed 
radix  FFT  is  much  more  flexible  than  WFTA  or  PFA  since  N 
can  be  any  length  sequence.  The  WFTA  and  PFA  are  closely 
studied  and  tradeoffs  between  the  two  are  discussed.  The 
PFA  uses  less  additions  but  more  multiplications  for  most 
sequence  lengths  which  means  the  WFTA  is  more  efficient 
when  multiplications  are  "costly"  relative  to  additions. 

The  PFA  uses  less  memory  than  the  WFTA  making  the  PFA 
preferable  when  the  machine  memory  is  limited.  -  based  on 


the  results  of  the  paper,  an  algorithm  is  presented  to  select 
the  most  efficient  DPT  for  an  N  length  sequence  given  the 
multiply  speed,  add  speed,  and  memory  size  of  the  computer. 


I.  Introduction 


1 . 1  Background 

Computing  the  Discrete  Fourier  Transform  (DFT)  of  N 
points  has  many  applications  in  scientific  and  engineering 
calculations.  In  1965  Cooley  and  Tukey  described  an 
algorithm  which  became  known  as  the  Fast  Fourier  Transform 

(FFT)  because  it  reduced  the  number  of  complex  operations 

.  2 
required  to  compute  the  DFT  from  N  to  N  log2  N  where 

N=2m,  m  an  integer.  Using  ideas  proposed  in  the  Cooley- 
Tukey  paper  a  mixed  radix  algorithm  was  written  and  pub¬ 
lished  in  1969  by  Singleton  which  permitted  N  to  be  any 
positive  integer  length  sequence. 

In  1976  Winograd  proposed  a  mixed  radix  DFT  algorithm 
which  (1)  converted  the  DFT  to  circular  convolution, 

(2)  used  fast  convolution  algorithms  to  perform  "short- 
DFTs" ,  and  (3)  nested  these  short-DFTs  into  a  structure  to 
perform  long  Fourier  transforms  on  complex  data  sequences. 
This  algorithm  became  known  as  the  Winograd  Fourier  Trans¬ 
form  Algorithm  (WFTA) .  The  WFTA  maintained  the  real 
additions  count  at  the  FFT  levels  while  significantly 
reducing  the  real  n.ultiplications  required. 

Kolba  and  Parks,  1977,  used  Winograd' s  fast  convolu¬ 
tion  algorithms  and  proposed  a  new  Prime  Factor  Algorithm 
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(PFA)  .  This  new  algorithm  modified  the  short-DFTs  to  unc 
"shifts"  instead  of  multiplication  by  1/2  and  did  not  use 
the  nested  structure  of  WFTA .  As  a  consequence  the  PFA 
uses  more  real  multiplications  and  less  additions  relative 
to  the  WFTA  for  a  given  length  sequence  N. 

1 . 2  Problem 

Both  Winograd,  1976,  and  Kolba-Parks,  1977,  compared 
their  operations  count  to  that  of  the  FFT  but  did  not 
include  all  possible  WFTA  and  PFA  sequence  lengths.  Fur¬ 
ther,  no  comparisons  were  made  on  the  basis  of  memory  arrays 
required  by  each  algorithm  as  a  function  of  N.  This  paper 
presents  a  comprehensive  comparison  of  fixed  radix  FFTs, 
mixed  radix  FFTs,  WFTA,  and  PFA  based  on  real  operations 
and  memory  arrays.  This  comparison  provides  the  informa¬ 
tion  needed  to  select  the  most  efficient  algorithm  to 
perform  the  DFT  based  on  machine  size,  machine  speed, 
and  real  operations . 

1 . 3  Scope 

This  paper  reviews  the  literature,  selects  DFT 
algorithms  for  comparison,  studies  the  theory  of  each 
algorithm  selected,  develops  the  real  operation  and 
memory  count  as  a  function  of  N,  compares  these  algorithms 
using  tables  and  plots  of  operation  and  memory  counts, 
and  presents  an  algorithm  to  select  the  most  efficient 
techniques . 


■  if 


2 


The  DFT  algorithms  selected  for  study  and  comparison 

ar<-' : 


(1) 

Radix-2  FFT 

(2) 

Radix- 3  FFT 

(  3) 

Radix- 3  FFT 

in 

the  R(u) 

field 

(4) 

Radix-5  FFT 

(5) 

Mixed  radix 

FFT 

written 

by  the  author 

(6) 

Mixed  radix 

FFT 

written 

by  Singleton 

(7) 

Mixed  radix 
Mathematical 
CDC  Cyber  74 

FFT  available  from  International 
Subroutine  Library  (IMSL)  on  the 

(8) 

WFTA 

(9) 

PFA. 

Each  of  these  algorithms  has  a  particular  advantage  which 
makes  selection  of  the  best  algorithm  dependent  on  the 
machine  size,  machine  speed,  and  sequence  length. 

1 . 4  Assumptions 

To  a  first  approximation,  the  speed  of  an  FFT 
algorithm  is  proportional  to  the  number  of  complex 
multiplications  used.  The  number  of  times  the  data  array 
is  indexed  is,  however,  an  important  secondary  factor 
(Singleton,  1969).  Kolba  and  Parks,  1977,  substantiated 
this  assumption  by  timing  the  PFA  and  FFTs  on  an  IBM 
370/155  for  several  sequence  lengths  and  showing  that  the 
FORTRAN  coded  PFA  (having  less  real  additions  and  multi¬ 
plications)  was  faster  than  the  FFT  FORTRAN  algorithms. 
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In  1978  Morris  demonstrated  that  the  sequence  of 
arithmetic  operations  in  a  DFT  algorithm's  internal 
structure  can  result  in  different  execution  times  "between 
ostensibly  equivalent  algorithms  on  a  given  machine" 
and  that  the  computer  dependent  algorithm/architecture 
interactions  may  also  alter  relative  performance  of  the 
different  algorithms.  He  modified  the  FORTRAN  coded 
radix-4  FFT  and  WFTA  programs  and  matched  them  to  the 
PDP  11/55  and  IBM  370/168  architecture  and  showed  that 
the  WFTA  offered  neither  time  or  space  advantages  over  the 
radix-4  FFT.  Morris  achieved  these  results  because  "the 
radix-4  FFT  appears  almost  ideally  matched  to  the  PDP-11 
architecture"  whereas  the  WFTA  "has  extra  load/store 
burdens"  and  requires  extra  data  array  indexing. 

Morris  demonstrated  that  it  may  be  possible  to 
optimize  DFT  algorithms  to  match  a  certain  machine,  how¬ 
ever,  this  type  of  optimization  of  the  FORTRAN  DFT  algo¬ 
rithms  is  outside  the  scope  of  this  paper.  It  is  assumed 
that  existing  FORTRAN  coded  DFT  algorithms  will  not  be 
modified  and  selecting  an  algorithm  which  minimizes  real 
operations  produces  the  most  efficient  algorithm. 

This  paper  derives  and  tabulates  real  operations 
counts  as  a  function  of  N  for  the  algorithms  listed  in 
Section  1.3.  The  most  efficient  DFT  algorithms  are  timed 
on  the  CDC  Cyber  74  computer  and  compared  to  the  predicted 
execution  time  based  on  real  operations.  These  predicted 
times  are  shown  to  be  consistent  with  the  timing  results. 


1 • 5  Approach  and  Presentation 

A  literature  review  is  presented  in  Chapter  II  which 
starts  with  the  1965  Cooley-Tukey  paper  and  follows  the 
various  DFT  algorithm  developments  up  through  Kolba-Parks' 
1977  article.  The  review  puts  Rader's  1968  landmark  paper 
in  perspective  with  Winograd's  "nested"  DFT  algorithm  and 
the  subsequent  work  by  Kolba  and  Parks. 

Next,  the  theory  behind  the  DFT  algorithms  is  reviewed, 
the  real  operations  count  developed,  and  the  memory  array 
count  needed  for  a  sequence  length  N  is  determined.  The 
general  expressions  for  real  operations  and  memory  array 
counts  are  developed  from  published  articles  or  from  the 
background  theory  and  then  plotted  and  tabulated  as  a 
function  of  N.  The  readers  familiar  with  the  FFT  and 
Winograd  background  theory  may  wish  to  skip  Sections  3.1 
and  3.2. 

In  Chapter  IV  comparison  tables  and  plots  of  the 
DFT  algorithms  make  it  possible  to  select  the  most 
efficient  algorithm  based  on  real  operations  and  memory 
array  required.  Timing  results  from  the  CDC  Cyber  74 
system  for  representative  sequence  lengths  are  tabulated 
to  substantiate  the  assumption  that  minimizing  real 
operations  equates  to  maximizing  efficiency.  An  algorithm 
is  also  presented  at  the  end  of  Chapter  IV  which  uses  the 
tables  in  this  paper  to  select  the  most  efficient  DFT 
technique  given  the  sequence  length,  memory  size,  and 
computer  add  and  multiply  speed. 


Conclusions  and  recommendations  are  presented 


Chapter  V. 


II. 


LITERATURE  REVIEW 


The  calculation  of  the  Discrete  Fourier  Transform  (DFT) 
is  a  central  operation  performed  in  digital  signal  proces¬ 
sing  but  was  not  widely  used  for  other  than  trivial  sequence 
lengths  because  of  the  cumbersome  DFT  evaluation: 

N-l 

X(k)  =  I  x(n)  exp(-j27mk/N)  (2.1) 

n=0 

2 

which  required  on  the  order  of  N  complex  operations. 

In  1965  Cooley  and  Tukey  published  "An  Algorithm  for 
the  Machine  Calculation  of  Complex  Fourier  Series"  which 
stimulated  the  widespread  use  of  an  algorithm  which  became 
known  as  the  "Fast  Fourier  Transform"  (FFT) .  Their  paper 
proposed  an  efficient  method  of  computing  the  DFT  by  factor¬ 
ing  an  N  length  sequence  into  its  prime  components : 

N  =  nl  n2  • ’ *  nm  (2.2) 

and  then  decomposing  Eq  (2.1)  into  m  steps  with  N/n^  trans¬ 
formations  within  each  step.  If  n.=n_=  ...  n  =2,  the 

operations  are  reduced  to  the  N  lc^  N  level  from  the 
2 

previous  N  level. 

Most  of  the  early  work  on  the  FFT  (Bergland,  1968)  was 
directed  toward  the  special  cases  where  N=2m  which  yielded 
simple  and  efficient  algorithms.  These  algorithms  are 
efficient  because  no  multiplications  are  needed  to  evaluate 
the  2-point  DFT  butterflies  which  can  reduce  the  operations 
count  below  the  N  log2  N  level. 
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1  Other  "fixed  radix"  algorithms  were  studied  and  Dubois 

and  Venetsnnopoulos  published  "A  New  Radix-3  Algorithm"  in 
1978  which  demonstrated  that  a  radix-3  butterfly  could  be 
computed  without  multiplications  by  defining  a  new  basis 
(l,u)  instead  of  using  the  complex  plane  ( 1 , i )  basis,  where 
u  is  the  complex  cube  root  of  unity.  This  technique  was 
later  shown  to  be  limited  to  the  special  cases  of  3m  and  6m 
(Burrus  and  Parks,  1979). 

Based  on  Cooley  and  Tukey's  paper  "mixed-radix" 
algorithms  were  written  by  Brenner  and  Singleton.  The 
most  efficient  and  popular  of  these  algorithms  was  "An 
Algorithm  For  Computing  the  Mixed  Radix  Fast  Fourier  Trans¬ 
form"  published  in  1969  by  Singleton  and  is  frequently  used 
in  digital  signal  processing  where  a  wider  choice  of  N  is 
needed.  The  Singleton  algorithm  can  perform  the  DFT  using 
FFT  techniques  of  any  length  sequence  N  but  becomes  most 
efficient  when  N  is  highly  composite  from  the  set  of  inte¬ 
gers  2,  3,  4,  and  5.  If  N  is  a  prime  number  the  algorithm 

.  2 

performs  a  DFT  using  N  operations.  The  Singleton  algorithm 
became  the  standard  against  which  all  future  DFT  techniques 
were  measured. 

In  1968  Rader  presented  "DFTs  when  the  Number  of  Data 
Samples  Is  Prime"  which  showed  that  a  prime  number  length 
sequence  contains  an  (N-l)  point  circular  convolution.  He 
showed  how  to  isolate  the  convolution  by  applying  a  permuta¬ 
tion  to  the  (N-l)  signal  points  x(l),  x(2),  ...  ,  x(N-l). 

He  also  gave  the  permutation  applied  to  the  complex 
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multipliers  from  the  set  [exp  (~j2irnk/N)  ,k=l ,  2 ,  N-l]  . 

Both  of  the  permutations  were  generated  by  using  a  "primi¬ 
tive"  root  which  exists  for  N  length  prime  sequences 
(McClellan  and  Rader,  1979).  Rader's  paper  was  largely 
overlooked  for  many  years  but  took  on  new  significance  when 
Winograd  presented  his  new  DFT  algorithm  "On  Computing  the 
Discrete  Fourier  Transform"  in  1976. 

Winograd  combined  Rader's  idea  of  converting  a  DFT  to 
circular  convolution  with  his  own  fast  convolution  algo¬ 
rithms  to  produce  a  new  DFT  method  called  the  "Winograd 
Fourier  Transform  Algorithm"  (WFTA) .  Winograd  provided  the 
fast  convolution  algorithms  for  short  prime  and  prime  power 
length  sequences  and  proposed  that  longer  transforms  be 
computed  by  "nesting"  the  short-high  speed  transforms.  He 
presented  a  table  comparing  the  WFTA  to  the  radix-2  FFT 
operations  and  showed  that  the  number  of  additions  remained 
at  the  FFT  levels  while  the  number  of  multiplications  was 
signif icantly  reduced. 

Kolba  and  Parks  published  "A  Prime  Factor  FFT  Algorithm 
Using  High  Speed  Convolution"  in  1977  which  modified 
Winograd1 s  fast  convolution  algorithms  to  permit  "shifts" 
instead  of  multiplications  by  1/2.  They  also  changed  the 
nested  structure  of  the  WFTA  in  favor  of  a  conventional  FFT 
decomposition.  Tho  decomposition  of  the  sequence  was  based 
on  an  algorithm  proposed  by  Thomas,  1963,  in  his  article 
"Using  a  Computer  to  Solve  Problems  in  Physics"  which  uses 
an  index  mapping  based  on  the  Chinese  Remainder  Theorem. 
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Kolba  and  Parks  selected  several  N  length  sequences  and 
compared  their  operations  count  to  WFTA  and  FFT. 

Paralleling  Winograd's  fast  convolution  work  are  the 
studies  into  number  theoretic  transforms  (NTTs)  which  have 
been  proposed  for  digital  cyclic  convolution  and  digital 
filtering.  The  NTTs  were  first  published  by  Pollard,  1971, 
in  "The  Fast  Fourier  Transform  in  the  Finite  Field".  He 
showed  that  an  analogous  transform  to  the  DFT  exists  in  the 
finite  (or  Galois)  field  where  exp ( j 2nnk/N)  terms  are 
replaced  by  r  in  the  DFT  expression  such  that: 

N-i  , 

X (k)  =  I  x (n)  rnK  (2.3) 

n=0 

Notice  that  Pollard  chose  the  alternative  definition  of  the 
DFT  where  the  exponent  of  e  is  positive.  The  r  term  is 
defined  in  the  Galois  field  (GF)  such  that  the  same  cyclic 
convolution  properties  exist  in  GF  and  in  the  complex  field 
for  the  DFT.  He  then  proved  that  this  analogous  DFT  could 
apply  prime  factor  decomposition  to  the  N  length  sequence 
and  perform  N/n^  transformations  to  reduce  the  operations 
in  GF  to  the  N  log2  N  level  which  provided  the  FFT  in  GF. 
Pollard  proposed  that  this  technique  be  applied  to  cyclic 
convolutions  in  GF,  multiplication  of  polynomials  over 
GF(pn),  aperiodic  convolution  of  integer  sequences,  multi¬ 
plication  of  very  large  integers,  division  of  polynomials 
over  GF(p),  and  a  chirp-Z-transf ortn  for  NTTs  (McClellan  and 
Rader,  1979)  . 
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Pollard's  paper  stimulated  more  study  of  the  NTTs. 

Reed  and  Truonq '  r,  1075  paper,  "The  Use  of  Finite  Fields  to 

Compute  Convolutions",  includes  complex  valued  NTTs.  It 

2 

was  stiown  that  this  NTT  over  GF(q  )  can  reduce  convolution 

operations  to  the  FFT  levels.  Tf  q  is  sufficiently  large 

2 

the  NTT  can  be  used  over  GF(q  )  to  transform  a  sequence  of 

2 

complex  integers  x(n)  into  X(k)  on  GF(q  )  for  which  the 

2 

inverse  transform  of  X(k)  on  GF(q  )  is  precisely  the 
original  sequence  x(n).  Using  these  ideas  filtering  or 
convolutions  without  roundoff  errors  can  be  obtained  on  a 
sequence  of  complex  integers. 

Most  applications  of  the  NTTs  have  been  in  the  areas 
of  digital  filtering  and  convolution.  The  author  was  not 
able  to  find  any  NTT  algorithm  which  could  be  compared  to 
the  FFT,  WFTA,  or  PFA  and  perform  all  the  same  functions 
as  these  three  algorithms. 

PFA,  WFTA,  and  FFT  represent  the  most  efficient  and 
flexible  FORTRAN  programs  available  to  perform  the  DFT. 

Each  algorithm  has  its  own  particular  advantage  over  the 
other  two  depending  on  machine  size  and  speed  for  a  particul 
sequence  length.  None  of  the  articles  rev iewed  presents  a 
comprehensive  evaluation  or  comparison  of  the  three 
algorithms  based  on  real  operations  and  memory  arrays 
required  to  perform  a  DFT  for  any  sequence  length  N.  This 
paper  fills  that  need  so  that  an  efficient  algorithm  can 
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III.  F_pT  Theory 

The  set  of  algorithms  known  as  the  Fast  Fourier 
Transforms  (FFT)  use  a  variety  of  methods  to  reduce  the 
computation  time  required  to  evaluate  the  Discrete 
Fourier  Transform  ( DFT ) .  The  DFT  is  the  central  part 
in  most  spectrum  analysis  problems  and  the  FFT  can  improve 
performance  by  a  factor  of  100  or  more  over  direct  eval¬ 
uation  of  the  DFT  (Rabiner  and  Gold,  1975).  Therefore, 
the  FFT  is  crucially  important  to  the  digital  signal 
processing  techniques. 

This  section  begins  with  "fixed  radix"  FFT  algorithms 
by  discussing  a  "decimation-in-time"  algorithm,  the  data 
reordering  (bit  reversal)  theory,  the  real  operations 
(addition  and  multiplication)  count,  a  new  fixed  radix 
algorithm  in  the  finite  field,  and  then  summarizes  the 
memory  required  to  use  the  fixed  radix  algorithms.  Next 
the  conventional  "mixed"  radix  algorithms  are  presented 
by  discussing  the  theory,  digit  reversal,  real  operations 
count,  and  memory  required  to  utilize  the  mixed  radix 
algorithms.  This  theory  chapter  concludes  with  a  dis¬ 
cussion  of  mixed  radix  algorithms  based  on  fast  convolu¬ 
tion.  The  theory,  data  reordering,  real  operations  count 
and  memory  are  also  presented  for  these  algorithms. 

Before  discussing  the  FFT  algorithms  comments  must 
be  made  relative  to  computing  the  trigonometric  function 
values  needed  to  evaluate  the  FFT. 
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3 . 1  Computing  Trigonometric  Function  Values 

The  trigonometric  viilucs  used  in  FFTs  can  be  repre¬ 
sented  as  values  on  the  unit  circle.  The  values  are  based 
on  integer  powers  of 

exp  (-j2rr/N) 

which  can  be  computed  using  sine  and  cosine  functions.  It 
is  useful  to  have  accurate  methods  of  generating  the  sine 
and  cosine  terms  other  than  the  method  of  repeated  use  of 
library  sine  and  cosine  functions. 

The  method  most  widely  used  in  FFT  algorithms 
(Singleton,  1967)  generates  the  trigonometric  functions  by 
a  difference  equation  given  by: 

cos  ( (k+1) a) 

=  (C  •  cos(ka)  -  S  •  sin  (ka) )  +  cos(ka) 
sin  ( (k+1) a) 

=  (C  •  sin(ka)  +  S  •  cos(ka))  +  sin(ka) 

where 

C  =  -2  sin^  (a/2) 

S  =  sin (a) 
cos  (0)  =1 
sin  (0)  =  0 

This  technique  is  used  for  all  FFTs  presented  in  this  paper 
(except  noted  otherwise)  because  it  minimizes  using  FORTRAN 
library  subroutines  cos  (•)  and  sin  (•)  thereby  reducing 
the  overall  FFT  computation  time. 
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3.2  Vi  xed  Radix  Algor  i  thins 

While  ITT  algorithms  are  well  known  and  widely  used, 
they  are  relatively  intricate  and  somewhat  difficult  to 
grasp  at  first  reading.  There  are  two  excellent  textbooks 
(Rabiner  and  Gold,  1975;  Oppenhcim  and  Schafer,  1975) 
which  discuss  the  FFT  theory  in  great  detail  and  present 
FFTs  based  on  decimation-in-time  and  frequency.  Both 
texts  spend  a  great  deal  of  time  discussing  the  radix-2 
FFT,  which  is  the  most  widely  known  and  ured.  For  this 
reason,  the  radix-2  development  is  presented  here  as  a 
convenience  for  the  reader  and  provides  a  theoretical 
background  from  which  the  other  fixed  radix  algorithms  are 
derived. 

3.2.1  Development  of  Radix-2  Theory.  To  achieve 

the  reduction  in  complex  operations  (defined  as  four  real 

2 

multiplications  and  two  real  additions)  from  N  to  N  log2  N 

it  is  necessary  to  decompose  the  DFT  computation  into 

smaller  and  smaller  DFT  computations.  As  a  result,  the 

symmetry  and  periodicity  of  the  complex  exponential 

nk 

exp  (- j  2-rrnk/N)  =  can  be  exploited.  This  radix-2 

algorithm  is  based  on  decomposition  of  the  sequence  x(n) 
from  the  DFT  expression: 

N-l 

X(k)  =  T,  x  (n)  exp  (- j  2imk/N)  (3.1) 

n=0 

k  =  0,  1,  ...,  N-l  and  N=2m 
which  is  known  as  a  "decimation-in-time"  algorithm 
(Oppenhcim  and  Schafer,  1975).  Since  N  is  an  even  integer, 

14 


X(k)  can  be  computed  bv  separating  x(n)  into  two  N/2  length 
sequences  consisting  of  even-numbered  points  and  the  odd- 
numbered  points  in  x(n) .  Using  n=2r  for  n  even  and  n=2r+l 


for  n  odd  Eq  (3.1)  becomes: 


(2r+l) k 


X(k)  =  E  x(2r)WM  +  E  x(2r+l)W 

r=0  N  r=0  N 

where  T=(N/2)-l  and  =  exp  (- j  2tt/N)  .  By  expanding 


(3.2) 


(2r+l) k 


and  factoring  out  Eq  (3.2)  can  be  rewritten  as: 


2  rk  k  T 


2  rk 


X(k)  =  E  x(2r)(WM)  +  WM  E  x(2r+l) (WM) 

r=0  N  r=0  N 


(3.3) 


But  WN  =  exp(-j4fr/N)  =  exp  (- j  2tt/ (N/2 ) )  =  WN<2  and  Eq  (3.3) 


can  be  written  as: 


k  T 


X(k)  =  E  x(2r)WN/2  +  WN  E  x(2r+l)WN/2 


=  G (k)  +  WN  H (k)  (3.4) 

Each  of  the  sums  in  Eq  (3.4)  is  an  N/2  point  DFT,  the 
first  sum  being  the  even  numbered  points  of  the  original 
sequence  and  the  second  sum  being  the  odd  numbered  points 
of  the  original  sequence.  Although  the  index  k  =  0,1,..., N-l, 
each  of  the  sums  in  Eq  (3.4)  need  only  be  computed  over 
k  =  0,  1,  ...,  (N/2J-1,  since  G(k)  and  H(k)  are  periodic 
in  k  with  period  N/2.  After  the  two  DFTs  in  Eq  (3.4)  are 
computed,  they  are  then  combined  to  yield  the  N-point  DFT, 
X(k).  Figure  3.1  indicates  the  computation  involved  in 
computing  X(k)  according  to  Eq  (3.4)  for  an  eight-point 
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Figure  3.1.  Flowgraph  of  the  Decimal.  ion-In-Ti.mo 
Decomposition  of  an  N-Point  DFT 
Computation  into  Two  N/2-Point  DFT 
Computations  (N~8) . 


NOTE:  The  integers  on  the  branches  of 

represent  the  powers  of  W  ;  i.c. 


represents  W 


the  flowgraph 
,  the  "4" 


N' 


sequence.  Figure  3.1  (Oppenheim  and  Schafer,  1975)  uses 

the  signal  flow  conventions  such  Cli.it  branches  entering  a 

node  are  summed  to  produce  the  node  variable.  When  no 

coefficient  is  shown  the  branch  transmittance  is  assumed 

to  be  one.  For  other  branches  the  transmittance  of  a  branch 

is  an  integer  power  of  W^.  Note  in  Figure  3.1  that  two 

four-point  DFTs  are  computed  using  C.(k)  and  H(k).  X(0) 

0 

is  obtained  by  multiplying  H(0)  by  and  adding  the  product 
to  G(0)  .  X(l)  is  obtained  by  multi  plying  If  ( 1 )  by  Wijj  and 
adding  the  result  to  G(l) .  For  X(4)  it  would  follow  that 

4 

H(4)  is  multiplied  by  and  added  to  G(4) ,  however,  since 
G(k)  and  H(k)  are  both  periodic  in  k  with  period  4,  H(4)  = 

H (0)  and  G(4)  =  G(0).  Thus  X(4)  results  from  multiplying 

4 

H (0 )  by  and  adding  the  produce  to  G(0). 

With  the  computation  of  the  N-point  DFT  of  Eq  (3.4) 

that  number  of  computations  can  be  compared  with  the  direct 

DFT  computation  of  Eq  (3.1).  For  the  direct  computation 

2 

without  using  symmetry  properties  N  complex  mul tiplications 
were  required.  Eg  (3.4)  requires  compulation  of  two  N/2- 
point  DFTs,  which  require  2(7/2)  complex  mul t i plications 
and  about  2(N/2)‘  complex  additions  (Oppenheim  and  Schafer, 
1975).  The  two  N/2-point  DFT.;  must  bo  combine d,  requiring 
N  complex  mul t i pi i ea l ions  corresponding  to  multiplying  the 
second  sum  by  and  then  N  complex  additions,  correspond ina 
to  adding  the  pioducL  to  the  first  sum.  As  a  result,  the 
computation  of  Eg  (3.4)  for  all  values  of  k  requires 


?  ? 

N  +  2(N/2)  or  N  ■)-  (N  \/2)  comp]  ox  multiplications  and 

2  2 

additions.  For  N-‘2<  N  +  N  /2  is  less  than  N  . 

The  expression  in  Eq  (3.4)  corresponds  to  decimating 

the  original  N-point  sequence  into  odd  and  even  N/2-point 

sequences.  Since  N=2m  the  N/2-point  sequences  are  also 

even  and  then  each  G(k)  and  II  (k)  can  be  further  decimated 

into  two  N/4 -point  DFTs,  which  could  then  be  combined  to 

yield  the  N/2-point  DFTs.  Decimating  the  N/2-point  sequences 

in  Eq  (3.4)  into  N/4-point  sequences  gives: 

(N/2) -1  rk 

G  (k)  =  Z  g(r)W 

r=0  N/2 

(N/ 4) -1  2pk  (N/4 ) -1  (2p+l)k 

=  i  g(2p)WN/2  +  Z  g(2p+l)W  /2 

p=0  p=0 

Letting  R  =  (N/4)-l# 

R  pk  k  R  pk 

G(k)  =  p50g(2p)WN/4  +  WN/2  p^Qg(2p+1)WN/4  (3*5) 

Similarly, 

R  pk  k  R  pk 

H(k)  -  Eoh(2p)WN/4  +  Wn/2  ^h(2p+l)WN/4  (3.6) 

If  the  four-point  J)FT  in  Figure  3.1  are  computed  using 

Eq  (3.5)  and  (3.6)  then  that  computation  would  be  carried 

out  as  indicated  in  Figure  3.2.  Inserting  the  computation 

in  Figure  3.2  into  the  flowgraph  of  Figure  3.1  produces  the 

2 

complete  flowgraph  in  Figure  3.3.  Note  that  WN^2  =  was 
used . 
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For  the  8-point  DFT  that  has  been  used  as  an  example, 
the  computation  has  been  reduced  to  a  computation  of  N/4- 
point  DFTs  where  N/4=2.  An  example,  2-point  DFT  for  x(0) 
and  x(4)  is  shown  in  Figure  3.4.  The  complete  flowgraph 
for  the  computation  of  the  8-point  DFT  is  shown  in  Figure 
3.5  and  was  obtained  with  the  computation  of  Figure  3.4 
and  inserting  it  in  Figure  3.3. 

Considering  the  more  general  case  with  N  a  power  of 
2  greater  than  3  the  same  decimation  procedure  would  be 
continued  by  decomposing  the  N/4-point  transforms  in 
Eqs  (3.5)  and  (3.6)  into  N/8-point  transforms.  This 
requires  v  stages  of  computation  where  v  =  log2  N.  Recall 
that  in  the  original  decomposition  of  the  N-point  trans¬ 
form  into  two  N/2-point  transforms,  the  number  of  complex 

2 

multiplications  and  additions  required  was  N  +  2(N/2)  . 

When  the  N/2-point  transforms  were  decomposed  into  N/4- 

.  2  . 
point  transforms  the  factor  of  (N/2)  is  replaced  by 

2 

N/2  +  2(N/4)  so  that  the  overall  computation  now  requires 
2 

N  +  N  +  4(N/4)  complex  multiplications  and  additions. 

If  N=2V  this  can  be  done  at  most  v  =  lc^  N  times,  "so 
that  after  carrying  out  this  decomposition  as  many  times 
as  possible  the  number  of  complex  multiplications  and 
additions  is  equal  to  N  lc^  N"  (Oppcnheim  and  Schafer,  1975). 

The  flowgraph  of  figure  3.5  displays  the  operations 
explicitly.  By  counting  branches  with  transmittances  of 

T 

the  form  it  is  seen  that  each  stage  has  N  complex 
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n  •%  t 


n 


mul  t  i pi  ic.'it  j om;  and  u  complex  additions.  Since  there  are 
log^  N  stages  there  are  a  total  of  N  loc^  N  complex  multi¬ 
plications  and  additions  as  shown  before.  Further  reductions 
in  the  complex  operations  count  can  be  achieved  by  exploiting 

V* 

the  symmetry  and  periodicity  of  . 

Note  that  on  each  "stage"  of  Figure  3.5  the  computation 
takes  a  set  of  N  complex  numbers  and  transforms  them  into 
another  set  of  N  complex  numbers.  This  process  is  repeated 
v=log  N  times  resulting  in  the  DFT  computation.  For  example, 
in  computing  the  first  stage  of  Figure  3.5  one  set  of  stor¬ 
age  registers  would  contain  the  input  data  sequence  and  a 
second  set  of  storage  registers  would  contain  the  computed 

results  for  the  first  stage.  The  sequence  of  numbers 

t  h 

resulting  from  the  m  stage  of  computation  is  denoted  as 
X^d),  where  i  =  0,  1,  ...,  N-l  and  m  =  1,  2,  .  ..,  v.  For 
the  following  stage,  the  previous  output  array,  X^d)  , 
becomes  the  input  array  and  the  new  output  array  is  X  +-^(i) 
for  the  (m+1)  stage  of  computation.  Using  this  notation, 
it  can  be  seen  that  the  basic  flowgraph  in  Figure  3.5  is 
given  by  Figure  3.6.  Using  the  notation  of  Figure  3.6  the 
equations  of  the  butterfly  arc  given  by: 

SWP>  =  Xm(P]  +  WN  Xm  ^  (3*7) 

r+N/2 

X  (q)  -  X  (p)  t  W  X  (q)  (3.8) 

m+1  1  m  1  n  m  J 

Because  of  the  appearance  of  Figure  3.6  the  computation  of 
Eqs  (3.7)  and  (3.8)  are  referred  to  as  the  "butterfly" 
computations . 
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tte 


The  number  of  complex  multiplications  can  be  reduced 
by  a  factor  of  2  using  file  symmetry: 

N/2 


W 


N 


=  exp (- j (2u/N)  •  N/2)  =  exp(-ju)  =  -1 


so  that  the  Eq  (3.7)  becomes: 


X  (p)  =  X  (p)  +  W„  X  (q) 
m+1  1  m  ^  N  m 


X 


m+1 


<*>  =  Xm(P>  -  WN  Xm  <*> 


(3.9) 


(3.10) 


(3.11) 


Eqs  (3.10)  and  (3.11)  are  shown  in  Figure  3.7  which  reflects 
the  "twiddle  factor"  out  front  in  the  butterfly.  Since 
there  are  N/2  "butterflies"  of  the  form  of  Figure  3.7  per 
stage  and  log2  N  stages,  the  total  number  of  complex 
multiplications  required  is  (N/2)  log2N  instead  of  the 
N  log2N  used  in  Figure  3.5.  Using  the  "twiddle  factor" 
butterfly  flowgraph  of  Figure  3.6  as  a  replacement  for  the 
butterfly  of  Figure  3.4,  the  Figure  3.8  is  obtained. 

3.2.2  Development  of  Radix-3  FFT  Theory.  Starting 
with  the  restriction  that  the  N-point  sequence  be  an 
integer  power  of  three  (N  =  3m,  m  =  1,  2,  3,  ...),  the 
DFT  X(k)  was  computed  by  separating  the  discrete  time 
sequence  s(n)  into  three  N/3  point  sequences.  X(k)  is 
given  by  the  DFT  expression: 

whore  k  =  0,1,  ...»  N-l 


N-l  nk 

X  ( k )  =  x  ( n )  WA 

n=0 


N 


and  =  exp(-j2n/N) 


(3.12) 


Breaking  x(n)  into  three  N/3  point  sequences  yields  x(3r), 
x(3r+l)  and  x(3r+2).  Substituting  those  into  Eq  (3.12) 
and  adjusting  the  respective  summations  to  (N/3)-l  yields: 
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P  (3r)k  P  (3r+l)k 

X  (k )  -  5'  x(3r)W  +  £  x(3r-tl)W 

r--0  ‘  r~0  N 

P  ( 3r+2 ) k 

+  £  x  ( 3n  2 )  W 

r=0  N 

whore  P  =  (N/3)-l  (3.13) 

By  regrouping  the  exponents  of  Eq  (3.13)  can  be 
rewritten  as: 

P  3rk  k  P  3rk 

X(k)  =  £  x(3r)W  +  W.T  £  x(3r+l)W.T 

n  N  N  n  N 

r=0  r=0 

2k  P  3rk 

+  WN  £  x(3r+2)W  (3.14) 

r=0 

.  .  3 

By  rewriting  as; 

3 

WN  =  exp(-j6r/N)  =  exp(-j2r/(N/3) )  =  WN^3  (3.15) 

Eq  (3.14)  can  be  expressed  as: 

P  rk  k  P  rk 

X  (k)  =  £  x(3r)W  /  +  W  £  x(3r+l)W 

r=0  w/  w  r=0  ' 

2k  P  rk 

+  W  £  x (3r+2)W  ,,  (3.16) 

r=0  N/ 

Each  of  the  sums  in  Eq  (3.16)  represents  an  N/3  point  DPT: 
the  first  being  the  N/3  DPT  of  the  3r  points  in  the 
original  sequence,  the  second  being  the  N/3  points  of 
3r+l,  and  the  third  being  the  N/3  points  of  3r+2  points  of 
the  original  sequence.  Although  the  index  k  of  X(k)  ranges 
over  N  values  (k  -  0,  1,  ...,  N-J.)  each  of  the  summations 
in  Eq  (3.16)  needs  computation  over  (N/3)-l  points.  Eq 
(3.16)  can  be  rewritten  to  reflect  this: 
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(3.17) 


k  2k 

X  (k )  -  F(k)  I  W.  (',(!:)  -t  WM  11(h) 

1  4  4  t 

Eq  (3.17)  can  be  implemented  into  the  butterfly  flowgraph 

in  Figure  3.9  using  the  accepted  notatiorml  conventions 

(Oppenheim  and  Schafer,  1975).  The  convention  used  for 

the  flowgraph  is  when  no  coefficient  is  shown,  the  branch 

transmittance  is  assumed  to  be  one.  For  other  brandies  the 

transmittance  (multiplier)  is  an  integer  power  multiplier 

of  W^.  In  Figure  3.9  there  are  three  N/3  point  DFTs  and 

these  are  computed  with  F(k)  designating  the  three  point 

DFT  of  the  3r  points,  G(k)  designating  the  three  point  DFT 

of  3r+l,  and  H(k)  designating  the  DFT  of  3r+2  points, 

where  r  =  0,  1,  ...,  (N/3)-l. 

X (0)  is  obtained  by  (1)  multiplying  11(0)  by  a  branch 

transmittance  of  1  (which  equals  W®) ,  (2)  multiplying 

G(0)  by  1,  (3)  multiplying  F(0)  by  1,  and  (4)  summing  the 

three.  Likewise,  X ( 1 )  is  obtained  by  multiplying  H(l)  by 
2  1 

WN,  multiplying  G(l)  by  W^,  and  adding  the  results  to  F(l). 
X(6)  has  H(6)  multiplied  by  W^2  and  0(6)  multiplied  by 
W®  and  the  products  added  to  F(6)  giving: 

X  ( 6 )  =  F  (6)  +  W®  G(6)  +  wj*  11(6)  (3.18) 

However,  since  F(k),  G(k)  ,  and  ll(k)  are  all  periodic  in 
k  with  period  N/3=3,  the  periodicity  can  be  exploited  to 
yield  F  ( 6 )  =  F(0),  G(6)  -  G(0),  and  11(6)  =  11(0).  These 
results  can  be  substituted  into  Eq  (3.18)  to  give: 

X  ( 6 )  =  F  (0 )  t  W®  G  (0 )  +  W*2  11(0)  (3.19) 
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Continuin'!  to  use  the  noriodic  properties ,  the 
results  tor  X(0)  through  X(8)  are: 


X(0)  =  F(0)  +  G  ( 0 )  +  11(0)  (3.20) 

1  2 

X(l)  =  F  (1 )  +  W9  G  ( 1 )  +  Wg  11(1)  (3.21) 

2  4 

X  ( 2 )  =  F ( 2 )  +  Wg  G(2)  +  Wg  H  { 2 )  (3.22) 

3  6 

X ( 3)  =  F (0 )  +  Wg  G (0 )  +  Wg  H ( 0 )  (3.23) 

4  8 

X  (4)  =  F  ( 1 )  +  Wg  G  ( 1 )  +  Wg  H  ( 1 )  (3.24) 

5  10 

X ( 5 )  =  F (2 )  +  Wg  G ( 2 )  4  Wg  H ( 2 )  (3.25) 

6  12 

X  (6 )  =  F  ( 0 )  4-  Wg  G  (0 )  4-  Wg  H  ( 0 )  (3.26) 

7  14 

X ( 7)  =  F ( 1 )  +  Wg  G ( 1 )  +  Wg  H (1 )  (3.27) 

8  16 

X  (  8 )  =  F  ( 2 )  4  Wg  G  ( 2 )  4-  Wg  H  ( 2 )  (3.28) 


Eqs  (3.20)  through  (3.28)  conclude  the  first  stage  decimation 
of  the  9-point  sequence.  The  DFT  computation  has  been 
reduced  to  computations  of  N/3-point  DFTs  where  N/3  =  3. 

An  example  3-point  DFT  for  >:(0),  x{3),  and  x(6)  is  shown  in 
Figure  *.10.  The  complete  flowgraph  for  the  computation  of 
the  9-paint  PIT  is  shown  in  Figure  3.11  and  was  obtained  by 
substitutin'!  tin*  computation  of  Figure  3.10  into  Figure  3.9. 

Considering  the  more  general  case  with  N  a  power  of  3 
g !  1  •  : '  '  t  v:<  *  tin  same  decimation  procedure  would  be 

coii!  inn.  d  by  di 'compos  i  ng  tee  N/3  DFTs  into  N/9  computations 
o /  .  ,  G(P),  and  Ii(k).  The  DFT  of  F(k)  is: 
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F(k) 


(3.29) 


(N/3)-l  rk 

)'  x  ( r)  W 
r-n  3 


This  equation,  letting  Q  -  (H/9)-l,  can  bo  divided  into 
three  N/9  lenqth  sequences: 


Q  3ik  Q  ( 3  i  4  2  )  k 

F (k)  =  l  f(3i)WM.,  +  £  f(3i+l)W 

i=0  N/3  i=0  N/3 

Q  (  3  i  4  2 )  k 

4  ).  f  (  3 i 42 )  V,’  . ~ 

i -n  N/3 


(3.30) 


Expanding  the  exponents  of  Eq  (3.30)  can  be  rewritten: 


3ik 


F  (k)  =  £  f(3i)W 

i  =  0 


W. 


N/3  N/3 


l  f  (  3i+l )  W. 
i=0 


3ik 

N/3 


2k  Q  3ik 

+  WN/3  i:0f(3l  +  2)’>;N/3 
3 


Using  the  substitution  K^yj  =  WN/9  ’ 

Q  ik  k  Q 

r 

i=0 


ik 


F(k,  =  E  f(3 i)WN/9  4  Wn/3  _£of(3i+l)WN/9 


(3.31) 


2k  Q 


ik 


+  Vi  i:0f(3i+2)'V9 


(3.32) 


Similar  expressions  for  C(r.)  and  H(m)  can  be  derived: 


0 


ik 


ik 


G(k)  -- 


q  (  3  i  )  W. ,  .  ()  4  V,’N/  3  •  ^ a  (  3  J  4  1  )  / y 


i;  0 
2k  0 


+  W 


N/3 


i  k 

q  ( 3  i  4  2 )  W, , 


1-  0 


/9 


(3.33) 


Q  ik  ! 

ff  (k  )  =  h  (  3  i  >!•:.,  4  I 

•  iV  4 


iO 


N/3 


Q  ik 

•  hn.i4i>w 

i-'O  ' 


2k  y 


ik 


+  wn/3  i;0h(3j+2!'V9 


(3.34) 
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(3.32)  •  hi'O'jqh  (3.24)  can  ii;:  used  t  j  dor  1  •. .  the 

general  cxprcsi;  Lon  foi  a  radix- 3  butterfly  llowyraph. 
Lotting  N  9  the  exprossiona  for  l'(k),  G(k)  arid  il(k)  Ix-comc : 

0  0 


F  ( 0 ) 

f  (0) 

t 

W3 

f  (1  ) 

3- 

W3 

f  (2) 

1 

2 

F(l) 

f  (0) 

3- 

W 

3 

f  (1) 

3- 

W3 

f  (2) 

2 

4 

F  (2) 

= 

f  (0) 

3 

W3 

1(1) 

3- 

W3 

f  (2) 

0 

0 

G  ( 0 ) 

= 

g  (0) 

3- 

W3 

9(1) 

+ 

W3 

9(2) 

1 

2 

G  ( 1 ) 

= 

g  (0) 

+ 

W3 

g  (l) 

3- 

W3 

g(2) 

2 

4 

G  ( 3 ) 

z= 

g  (0) 

3- 

W3 

g  (1 ) 

+ 

W 

3 

g(2) 

0 

0 

H  ( 0 ) 

= 

h  ( 0) 

3- 

W3 

h  ( 1 ) 

3- 

W 

3 

9(2) 

1 

2 

H  ( 1 ) 

— 

h  (0) 

3- 

W3 

h  ( 1 ) 

3- 

W 

W3 

g(2) 

2 

4 

H  ( 2 ) 

= 

h  (0) 

3- 

W3 

h(l) 

3- 

W3 

9(2) 

From 

Ec 

[S  (3 

.  3 

5)  • 

through 

(3. 

.37) 

(3.35) 


(3.36) 


(3.37) 


multipliers  are  derived  (consist  en!  with  Oppenhei m  and 
Schafer)  to  be: 

k  2k 


X  (k )  =  F  (k)  +  WN  G(k)  3-  WN  II  (k) 

k+r  2k-)2r 

X(ktr)  =  F (k )  3  WN  G(k)  +  WN 

k+2r  2k  i  4  r 

X  ( k  3  2 1 )  =  I  ’  ( k )  +  W  G(k)  3  w 


(3.38) 

(3.39) 

(3.40) 


whore  r  repn  r.<  nt  the  distance  bet  worn  the  endpoints  of 
the  butterfly.  In  F inure  3.11  r  1  for  .stain'  1  and  r  2  for 


stage  2.  1  .  (3.30)  l  in  ouiih  (3.-10)  arc-  rujircsen)  i'd  in 

l'i-jur e  3.12  which  is  the  -.|c-noral  radix- 3  butterfly 
f  lowqraph . 

The  exponents  of  Figure  3.12  can  be  rewritten  to: 


W 


k  +  t 


wk  w1 


2k  +  2r  2k  ,,?r 
W  -  W  W 


t,k+2r  f7k  ,.2r 
W  -  W  W 


,,2k+4r  ,,2k  ,,-1r 

W  =  W  W 


(3.41) 

(3.42) 

(3.43) 


(3.44) 

With  these  expressions  for  the  butterfly  multipliers  an 

alternative  arrangement  to  Figure  3.12  is  possible  by 

"premultiplyinq"  or  "twiddling"  the  inputs  to  G(k)  and 

H(k)  (Gentleman  and  Sande,  1966).  The  multipliers 
2k 

and  W^  represent  the  twiddle  factors  of  the  butterfly 
in  Figure  3.13.  Since  N=3r  (Oppcnheim  and  Schafer,  1975) 
the  butterfly  multipliers  can  be  reduced  to: 

^  =  W ~  exp  (- j 2nr/3r)  =  exp  { - j 2n/3 )  (3.45) 

=  -0.5  -  j.866 


W. 


W*r  =  W^.  -  exp  ( — ;} 4 n / 3 )  -0.5  +  j.866 


N 


wjr  =  w!' 

N  3r 


-xp  (  -  j  Stt/3) 


•0.5  -  j.866 


(3.46) 

(3.47) 


Oppcnheim  and  Schafer  observed  that  there  is  no  advantage 
in  Figure  3.12  to  the  alternate  twiddle  factor  version  in 
Figure  3.13  because  "cxp(-j2  /3)  and  all  the  powers  thereof 
are  complex  coefficients  that  require  multiplications". 
However,  for  the  particular  FORTRAN  FFT  radix-3  programs 
which  implcio-nted  Figui  es  3.12  and  3.13,  the  twiddle  factor 
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version  of  the  radix- 3  ITT  was  much  more  efficient  to 

imp  Lemon L  because  only  two  twiddle  lac Lon;  had  to  be  computed 
k  2Y 

(W  and  W  ”)  per  butterfly  and  the  butterfly  multipliers  were 
the  constants  in  Eqs  (3.45)  and  (3.46),  the  original  version 
of  Figure  3.12  requires  that  all  six  complex  multipliers  be 
computed  for  each  butterfly.  The  twiddle  factor  version 
represents  a  simplification  over  the  original  radix- 3 
butterfly . 

3.2.3  Radix- 5  Theory .  The  theory  for  the  radix-5 

algorithm  follows  a  development  similar  to  the  radix-3. 

Because  of  this  similarity  only  the  radix-5  results  are 
given  here  for  comparison  to  the  radix-3,  readers  interested 
in  detailed  development  are  referred  to  Appendix  D. 

The  basic  butterfly  multipliers  for  the  radix-5  are 
given  by: 

k  2k  3k  4k 

X  (k )  =  A(k)  +  WN  B  (k)  +  WN  C  (k)  +  W  D(k)  +  E(k)  (3.48) 


k+r  2k+2r  3k+3r 


X  (k+r) 

=  A(k)  +  W  B  (k)  + 

WN  C(k)  +  WN 

D(k) 

+ 

4k+4r 

WN  E(k) 

(3.49) 

X (k+2r) 

k+2r 

=  A  00  +  WN  B  (k) 

2k -i  4  r 

+  WN  C(k)  + 

3k+6r 

WN  D(k) 

4k+8r 

+  WN  E (k) 

(3.50) 

X (k+3r ) 

k +3r 

=  A (k )  +  wN  B(k) 

2k+6r 

+  WN  C  (k )  + 

3k+9r 

wN  D(k) 

4k+12r 

+  WN  E(k) 

(3.51) 
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k+4  r  2k  l  8r  3k+12r 

X  ( k  +  4  r )  -  A  (k  )  +  VI  R(k)  +  W. ,  C(k)  +  WM  D(k) 

i  i  u  LN 

4k+16r 

+  WN  <3-S2> 

The  Eqs  (3.48)  through  (3.52)  arc  shown  in  the  twiddle 
factor  butterfly  of  Fiejure  3.]  4  where  "r"  is  the  distance 
between  the  butterfly  and  points.  Since  E -5r  the  butterfly 
multipliers  reduce  to  constant  complex  multipliers  of: 


r 

6r 

16r 

rN  = 

-  W  -- 

N 

WN  =  cos  ( 2 /  5  )  -j 

sin ( 2 ’ / 5 ) 

2r 

12r 

fN 

=  W 

N 

=  cos(4ti/5)  -j  sin( 

4n/5) 

3r 

2r 

*  8r 

fN 

“  (WN  > 

=  W  =  cos(4ti/5) 

+j  sin(4n/5) 

4r 

r  * 

9r 

fN 

’  (WN> 

=  WN  =  cos  ( 2tt/5 ) 

+j  sin(2:i/5) 

These  constant  butterfly  multipliers  are  computed  once 
during  the  FFT  computation  and  used  in  every  radix-5 
butterfly . 

3.2.4  Digit  Reversal  Algorithm.  In  order  for  the 

DFT  to  be  computed  as  discussed  above,  the  input  data  must 

be  stored  in  nonsequential  order.  In  fact  the  order  in 
which  the  input  data  are  stored  is  in  "bit-reversed"  order 
for  the  radix-2  FFT  and  "digit-reversed"  order  for  the 
other  fixed-radix  algorithms.  To  see  what  is  meant  by  this 
terminology  note  that  for  the  8-point  radix-2  flowgraph  of 
Figure  3.8  three  binary  digits  are  required  to  index  through 
the  data  array.  Writing  the  input  indices  Xq  in  binary  form 

and  then  reversing  the  order  of  th'  .^s  gives: 


4.1 


X0  (0)  =  X0  (000)  =  x (000)  =  x (0) 

X0(l)  =  XQ (001)  =  x  (100)  =  x (4 ) 

Xq(2)  =  XQ(010)  =  x ( 010 )  -  x (2) 

XQ(3)  =  X0(011)  =  x ( 110 )  =  x ( 6 )  (3.53) 

Xq(7)  =  X0(lll)  =  x(lll)  =  X ( 7) 

If  (n2  n^)  is  the  binary  representation  of  the  index  of 

the  sequence  x(n) ,  then  sequence  value  s (n2  n^  is  stored 

in  array  position  xQ (nQ  n1  n2) .  That  is,  in  determining  the 
position  of  x(n2  n^  n^)  in  the  input  array,  the  bits  of 
index  n  must  be  reversed  in  order. 

For  the  radix-3  FFT  the  input  array  must  be  in  a 
similar  nonsequential  order.  The  order  is  determined  by 
"digit  reversing"  the  input  sequence  value  using  a  modulo-3 
counter.  The  digit  reversed  radix-3  FFT  example  where  N=9 
is  shown  in  Figure  3.15.  The  modulo- 3  counter  is  given  by: 

COUNT  -  (bx  •  31)  +  (b0  •  3°)  (3.54) 

where  b^  =  0 ,  1,  2.  The  reversed  count  is  given  by: 

REVCOUNT  =  (bQ  ♦  31)  +  (b]L  •  3°)  (3.55) 

Eqs  (3.54)  and  (3.55)  show  the  modulo-3  counter  for  N=9 
which  requires  onJy  two  bits:  b^  and  b^  to  represent  the 

,  3 

input  sequence.  For  the  case  whore  N=3  =27  three  bits  are 
needed  to  represent  the  input  sequence  x(n)  and  the  modulo-3 
counter  becomes: 
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(3. 5C) 


COUNT  ( b ,}  •  t?)  t  (bx  •  31)  f  (i>0  •  3°) 


and  the  li'v  j  re  diqjt  counter  if;: 

REVCOUNT  -  (bQ  •  32 )  +  {b1  •  31)  +  (1)2  •  3°)  (3.57) 

Similarly  the  general  expression;;  fur  COUNT  and  REVCOUNT 

in 

can  be  given  where  N- 3  '  and  b^  -  0 ,  ],  2: 

COUNT  --  (b  ,  •  3m_:i)  +  (b  0  •  3m~2)  +  ... 

m- 1  m- 2 

+  (bx  •  31)  +  (bQ  •  3°)  (3.58) 

and 


REVCOUNT  =  (b±  •  3m_1)  +  (b2  •  3m  2)  +  ... 

+  (b  ,  •  31)  +  (b  .  •  3°)  (3.59) 

m-  2  m- 1 

Once  COUNT  and  REVCOUNT  are  computed  the  magnitudes  are 
compared.  If  REVCOUNT  is  less  than  or  equal  to  COUNT  a 
swap  of  the  values  indexed  by  COUNT  and  REVCOUNT  is  not 
required;  otherwise  exchange  the  array  value  indexed  in 
by  COUNT  with  the  array  value  indexed  by  REVCOUNT.  The 
cojnters  are  incremented  by  one  and  the  process  continue^, 
until  fill  N  indices  have  been  tested. 

3.2.5  Development  of  a  Radix- 3  FET  P.w.n  .i  on  the 


Cube  Root  oT  Unity.  Thi  s  section  presents  the  t  Incur 
a  radix-3  EFT  algorithm  which  uses  the  complex  cube 
unity  to  perform  the  complex  Fourier  transformation 
fly)  without  using  multiplications.  The  benefit  of 
technique  will  also  be  discussed  in  the  section  on  r 
operations  count.  ^ 
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(j^ttcr- 
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Wh  i  le  the  i  i:  f  ci  euc<  •  (Pubois  and  Vend  sanopoulos  , 


1  It..;  i  re::  f  ;•  i;  I  inn  of  this  technique,  it 

le.ivi:;  out  several  steps  which  aid  in  understanding  the 
theory  and  for  that  reason  it  is  presented  again  here. 

This  algorithm  uses  basis  vectors  (l,u)  instead  of  the 
conventional  complex  plane  vectors  (l,j)  to  perform  the 
compl ex  Fourier  transform  (whore  u  is  the  cube  root  of  1 
and  j  is  the  square  root  of  -2).  The  new  basis  vectors 
use  arithmetic  notation: 

a  +  bu  =  R(u)  ;  a,  b,  real  numbers  (3.60) 

Taking  u  as  the  cube  root  of  1  implies: 

u3  -  1  =  0  (3.61) 

or 

(u-1) (u2  +  u  +  1)  =0  (3.62) 

Since  it  is  known  u  1,  then 

u2  +  u  +  1  =  0  (3.63) 

or 

u2  =  -1  -  u  (3.64) 

Eq  1 3. 60)  is  used  in  the  definition  of  multiplication  in 
the  U(u)  field: 

2 

(a  +  bu) (c  +  du)  =  ac  +  bdu  +  adu  +  bcu  (3.65) 

Substituting  Eq  (3.64)  into  Eq  (3.65)  results  in: 

(a  +  bu) (c  +  du)  -  (ac  -  bd)  +  (ad  +  b(c-d))u  (3.66) 

The  expression  in  Eq  (3.66)  can  be  expanded  and  then 
recombined  to  reduce  the  number  of  multiplications: 
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ad  l  b(c-d)  -  ad  I  be  -  bd  - 
=  ac  +  ad  +  be  + 
=  (a  +  b)  (c  +  d) 
Substituting  Eq  (3.69)  into  Eq 


bd  + 

bd  + 

ac  -  ac 

(3. 

,67) 

bd  - 

ac  - 

bd  -  bd 

(3. 

.68) 

-  ac 

-  bd 

-  bd 

(3. 

,  69) 

( 3 . 6G)  gives : 


(a  +  bu)  (c  +  du)  =  (ac  -  bd)  (3.70) 

+  ((a  +  b) (c  +  b)-  ac  -  bd  -  bd))u 

The  result  in  Eq  (3.70)  requires  three  real  multiplications 

and  six  real  additions  compared  with  conventional  complex 

multiplication  which  requires  four  real  multiplications  and 

two  real  additions.  Multiplication  in  the  R(u)  field  requires 

one  less  multiplication  but  four  more  additions. 

3  3 

The  expression  for  u  is  obtained  from  u  =  1  by  lotting 
3  3 

u  =  (exp  ( —  j  2tt/3  )  )  =  1.  Consequently,  u  =  exp(-j2a/3)  = 

-1/2  -j (/3/2)  which  is  used  for  conversion  between  a  +  bj 
and  c  +  du: 


c  +  du  =  c  +  d ( — 1/2— j (/3/2) )  =  c  -  d/2- j ( /3/2 ) d 


0* 


f 


c  +  du  "  (c  -  d/2)  +  j(-/3/2)d 
To  find  the  conversion  from  a  +  bj  to  c  +  du ,  solve 
Eq  (3.70)  for  j : 


c  +  du  =  (c  -  d/2)  +  ( - 1  >d/'2 )  j 
d/2  +  du  =  (-/3/2)  d  j 
d  (1/2  +  u)  =  (-/3/2)d  j 


1/2  l  u  =  (-/3/2)j 
j  -  (-2//3M1/2  +  u) 


(3.73) 


Using  I’.q  (3.66)  .uni  a  +  bj  l  he  convemion  to  c  I  du  is: 
a  -I  b  j  -  a  H  b(-2/i  ./)  (1/2  l-  u) 

=  a  +  b(-2//i)  (.1/2)  +  b(-2//3)u 
a  +  bj  -  (a  -  b//3)  +  (~2b//3)u  (3.74) 

Using  the  R(u)  arithmetic  developed  above,  it  can  bo 
shown  that  a  radix-3  FFT  butterfly  can  be  developed  which 
requires  no  multiplications  except  for  the  twiddle  factors 
in  Figure  3.13. 

Using  Eq  (3.74)  and  =  cos(2iir/N)  +  j  (-s.in  (2’  r/N)  ) 
produces : 

c  +  du  =  (cos(2irr/N)  +  sin  (2iTr/N) //3) 

+  (2  sin  (2itr/N) //3)  u  (3.75) 

Using  the  substitution  of  N  =  3r  in  Eq  (3.75)  reduces  it  to 

=  (cos(2tt/3)  +  sin  (2n/3)  /3)  +  2  sin  ( 2  tt  /  3 )  /3)  u 

wj  =  0  +  lu  =  u  (3.76) 

Likewise  the  remaining  W  terms  in  Figure  3.7  can  be  reduced 

W^r  =  (cos(4v/3)  +  sin  (4  ../3) //3)  +  2  sin  (4-p /3 )  //3  )  u 

W^r  =  -1  -  ]  u  (3.77) 

W^r  -  0  +  1  u  =-■ •  u  (3.78) 

Substituting  Eqs  (3.76)  through  (3.78)  into  Figure  3.13 
produces  Figure  3.36. 
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Us  i  n<i 

'  *  ■ 

i  t  )  1 1 S  r  1 

i  e  .i  n  i 

It  ( u ) 

iind  t\t! 
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tin.  operation 

It  1 

’  1  •  •  U  1  i  - 

J  »  i 

•  .  .hm 
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,  i  OUl  lt 
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i  l  .(•  : 

ant!  no 

mill  tipi  if-::  .ire  required  to  uv;j  .1  uu  to  the  butter  f]  v  i‘  J  owarnph . 
Xi ,  are  the  butterfly  inputs  aftei  twiddle  factor  Multi¬ 
plication  and  7v  (•  )  ,  D  ( *  )  are  the  butterfly  outputs  in  the 
R ( u )  field. 

A  ( 1. )  +  13  ( 1 )  u  -•  (XI  +  X2  +  X3 )  +  (Y 1  +  Y2  +  Y3)u  (3.79) 

A ( 2 )  +  B ( 2 ) u  =  (X2  +  Y2u) (0  +  u)  +  (X3  +  Y3u) (-1  -  u) 

+  (Xl  +  Ylu) 

A  ( 2 )  +  B(2)u  =  ( - Y  2 )  +  (X2  +  Y2  (-1 ) ) u  +  (-X3  +  Y3) 

+  (-X3 ) u  -  XI  +  Ylu  (3.80) 

=  (XI  -  Y2  -  X3  +  Y3)  +  ( Yl  +  X2  -  Y2  -  X3)u 

A  ( 3 )  +  B  ( 3)  u  =  XI  +  Ylu  +  (X2  +  Y2u)  (-1  -  u) 

+  (X3  +  Y3u) (0  +  u) 

=  XI  +  Ylu  +  (-X2  +  Y2)  +  (-X2 ) u  +  (-Y3) 

+  (X3  +  Y3 (Ql) ) u  (3.81) 

=  (XI  -  X2  +  Y2  -  Y3)  +  (Yl  -  X2  +  X3  -  Y3)u 

There  are  16  real  additions  shown  in  Eqc  (3,80)  and 
(3.  Cl);  however,  by  eonibini  ny  common  terms  -Y2  -  X3  -  -R 
and  -X2  -  Y3  -  -S,  Lite  radix-3  butter !  1  y  can  be  evaluated 
using  only  fourteen  real  additions  (neglecting  the  twiddle 
factors) : 

A  ( 1 )  -  Xl  +  X2  H-  X3 

B  ( 1 )  "  Yl  +  Y2  H  Y3 

A  ( 2 )  -  Xl  +  Y3  -  R 

where  R  =  Y2  +  X3 

30 


B ( 2 )  =  Yl  +  X2  -  R 


A  (  3 ) 


xi  +  y; 


H  (  1 )  ^  'l  l  i  X  3  -  S  v.  here  S  -  X2  )  Y  3 

3.2.6  Summary.  This  con;i.l  etes  tin-  discussion  of 
fixed  radix  ITT  theory.  In  t  h  i  x  section  the  cjcneral  theory 
was  developed  uuinq  the  radix-3  ease  as  an  alternative  to 
the  more  common  radix-2  development.  A  docimntion-in-time 
for  N- 9  was  shown  and  the  basic  butterfly  equations  for 
radix-3  was  derived.  because  of  the  similarity  to  radix-3 
butterflies,  the  radix- 5  theory  was  not  developed  bat  the 
butterfly  equations  necessary  to  implement  a  radix-5  FFT 
was  given.  Finally,  a  new  radix-3  FFT  (Dubois  and 
Venetsanopoulos ,  1978)  was  developed. 

3.3  Real  Operations  Count  for  Fixed  Radi x  FFTs 

The  speed  at  which  an  FFT  algorithm  can  perforin  the 

DFT  is  a  (to  a  first  approximation)  proportional  to  the 

number  of  complex  multiplications  used  in  the  algorithm 

(Singleton,  19G9).  The  number  of  times  the  data  array  is 

indexed  is  a  secondary  factor  and  is  shown  to  have  minimal 

impact  on  the  results  of  this  paper. 

An  anomaly  in  the  nomenclature  should  bo  pointed  out 

before  further  discussion  of  "complex  multiplications" 

related  to  FFTs.  A  complex  multiplication  implies  four 

real  multiplications  and  two  real  additions.  It  has  been 

2 

shown  (Sinn  I eton,  1969)  that  (p-1 )  real  multiplications 

arc  required  to  evaluate  a  complex  transform  of  dimension 

m  2 

p,  p  odd,  where  N-p  .  Singleton  then  refers  to  the  (p-lj 
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I 


1 


.2 

real  multiplications  ay  (p-1)  "  complex  multipl  i cations 

which  .1  ni  !  i . . : :  !  con-.  <  •!;  ii-iia.  :  i  new  a  co::;g  1  ex  Lrans- 

2 

form  of  cl  i  .Hons  i  on  }>  rctjtmcs  more  than  (p-1)  ' /?.  real  additions. 
Throughout  this  paper  all  references  to  multiplications  and 
additions  arc-  in  terms  of  real  operations  and  not  complex 
operations . 

The  real  opera  Lions  are  determined  from  (i)  the  number 
of  butterflies  times  the  number  of  real  operations  required 
to  compute  the  butterfly  and  (2)  the  number  of  twiddle 
factors  times  real  operations  required  per  twiddle  factor, 
and  (3)  the  number  of  trigonometric  functions  (sine  and 
cosine)  which  must  be  computed.  The  real  operations  count 
for  a  radix-p  FFTs  are  derived  as  a  function  of  N,  m,  and 
p  where  N=pm. 

3.3.1  Number  of  Butterflies  in  Fixed  Radix-p  FFTs . 

The  number  of  butterflies  is  dependent  on  N,  m,  and  p, 
where  N=pr'.  Examining  the  radix-2  FPT  in  Figure  3-8  shows 
that  there  arc  8  input  points  and  8  output  points  for  each 
stage.  The  radix-2  butterfly  in  Figure  3.7  has  2  input 
and  2  output  points  which  means  chat  Figure  3.3  must  have 
C/2  -  I  but  to r  C  i  .i  es  per  stage.  There  arc.'  3  stage's  in  this 
radix-2  ITT  (where  N~2^)  giving  a  total  of  12  butterflies 
.in  this  ITT. 

In  general  the  number  of  radix-p  butterflies  is  given 
by :  mn/p  (3.82) 

This  equation  can  be  checked  for  the  radix- 3  example. 

Given  that  N-9,  p  3,  and  nr;2  llq  (3.82)  gives  the  total 
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mi,:. her  of  but  !  cri  .1  i.es  as  2  •  9/3  6.  7h  i x  is  verified 

by  i‘.i  ijui'c  .> .  I  1  which  has  G  radix-- 3  LuL  U:n  lies. 

3.3.2  Number  of  Twiddle  Factor:',  in  Fixed  Kadi.x-p 
I'FTs.  The  twiddle  factors  are  complex  multipliers  of  the 
form  exp  ( -  j  2  ,i  rk/N )  which  multiply  each  rudix-p  butterfly 
as  shown  in  Figure  3.8.  Notice  that  each  stage  has  M/p  = 

8/2  =  4  butterflies,  each  of  which  requires  p-1  =  2-1  -  1 
complex  twiddle  factor.  The  general  expression  for  number 
of  twiddle  factors  in  each  stage  becomes: 

N (p-1 ) /p  (3.84) 

Given  that  N=pm  there  are  m  stages  in  a  radix-p  FFT  making 
the  total  number  of  twiddle  factors  for  the  FFT  equal: 

mN(p-l)/p  (3.85) 

Some  of  the  complex  twiddle  factors  are  W.^  =  1  and  can  be 
eliminated.  In  any  FFT  there  are  N-l  of  these  unity  twiddle 
factors  (Singleton,  19G9)  which  gives  the  final  expression 
for  the  number  of  complex  twiddle  factors  as: 

mN  (p-1)  ,/p  -  (N-l)  (3.86) 

Using  N  =  pn‘  =  2  -  C  in  Fq  (3.80)  l  lie  numbo r  of  twiddle 

factors  is  found  to  bo  5.  Examining  Fxgui o  3.8  for  N  2' 

.shows  there  an:  5  non-unity  twiddle  factors. 

3.3.3  Number  of  Tr.ioonoiaoi  ric  Funct  ions  Required 

for  the  FI xod  Radix  Al corithms.  The  t r i gonomc trie  functions 
of  nine  am!  cosine  are  needed  to  compute  the  twiddle  factors. 
The  fixed  i.xulix-2  algor  i  t  hin  uses  calls  to  the  FORTRAN 
library  SIN  and  COS  functions  as  well  as  the  difference 
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equation:;  giv..  a  in  Sect  ion  3.1.  The  radix- 3  and  5  l'FTs 
use  on!  y  nip.'  vi  cesi:.-  !  fen  a.. a1  .  -qua  t  ?  <  -as . 

The  radix-.!  alqori  thin  in  /Appendix  7\  computes  one  sine 
and  cosine  at  each  stage  of  tire  FFT  using: 

VI  -  CMP  LX  (COS  (PI/LF1)  ,  SIN  (PJ/I.El )  ) 

Each  radix- 2  1'FT  lias  m  stages  whore  N-=2m  v.’hich  means  the 
sine  and  cosine'  functions  are  called  m  times  for  the  FFT. 
Once  the  initial  sine  and  cosine  are  computed  for  the 
stage  each  new  twiddle  factor  in  the  stage  is  computed 
using  the  complex  multiplication: 

U  =  TJ  *  W 

where  the  complex  U  was  originally  initialized  to  U  =  (1,0). 
The  complex  multiplication  U  *  W  effectively  implements 
the  sine  and  cosine  difference  equations  in  Section  3.1. 

The  number  of  times  U  *  W  is  computed  for  each  FFT  stage 
is  a  function  of  the  number  of  different  twiddle  factors  in 
the  stage  nr  .  In  Figure  3.8  the  first  stage  has  only  one 
type  of  twiddle  factor  W  ,  the  second  stage  has  two  types: 

and  ,  while  stage  lies  four:  .  The 

general  expression  for  the  typos  of  twiddle  factors  in 
each  stage:  is: 

k-l 

TF  -  2*  1 

Tims  for  stage  1,  k=l  and  TF"2^--1,  which  gives  one  type 

i 

of  twiddle  factor;  for  singe  2,  Jr  ?,  and  TF-2  ~2  giving  two 

types  of  twiddle  factors;  and  finally  for  the  last  stage 

2 

m  this  example  k~3  and  TF~2  =4,  or  four  typos  of  twiddle 
f actors  are  required.  In  general  for  the  radix-2  it 'T  in 
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Appendix  A  t  he  complex  mult i } > .1  i  cation  U 


W  is  evaluated 


.i  to  tail  o  i 

n'  k  - 1 
>:  (2k  x) 

k=l 

times,  where  m  is  tiie  number  of  stages  for  N=2m.  Given 
that  the  complex  multiplications  requires  4  real  multipli¬ 
cations  and  2  additions,  the  number  of  operations  required 
to  compute  sines  and  cosines  for  this  radix-2  FFT  is: 

m  k-1 

real  mult  =  4  E  (2* 
k-1 

m  k-1 

real  add  =2  E  (2  A 
k=l 

sine  and  cosine  calls  =  m  (3.89) 

ae  real  operations  required  to  compute  the  sine  and 
cosine  lookup  tables  for  the  radix-3  and  5  algorithms  is 
less  complex  than  the  radix-2  FFT.  In  these  algorithms 
the  difference  equation  from  Section  3.1  is  used  to  compute 
sine  and  cosine  lookup  tables  which  have  length  N.  Because 
of  the  symmetry  of  sin(k)  =  -sin(-k)  only  N/2  computations 
of  the  difference  equations  arc  required.  The  equations 
are  given  by: 

WKC(I)  -  C  *  WKC(I-l)  -  S  *  WKS(I-l)  +  WKC(I-l) 

WKS(I)  -  C  *  WKS(I-l)  +  S  *  WKC(I-l)  +  WKS(I-l) 
which  need  a  total  of  4  real  multiplications  and  10  addition 
to  compute.  For  an  N  length  sequence  computing  the  lookup 
tables  require: 


)  (3.87) 

)  (3.88) 


Tin  i  1  mill  t 


•1  (h/2)  ;  2M 


(3.90) 


n\!  l  add  --  3  0  (N/2)  -  SN 

3.3.4  Number  of  Real  Opera! i ons  i_n  Radi x-p  FFTs. 

Bared  on  the  general  expressions  in  IL'qs  (3.82)  through 
(3.9.1)  the  total  number  of  real  multipl  i cations  can  be 
determined  given  N-pm  where  N,  p,  and  m  are  integers. 

First,  each  radix-p  butterfly  computation  requires  multi¬ 
plications  or  additions  or  both  to  bo  evaluated.  The 
exact  number  of  multiplies  and  adds  is  determined  from  the 
FORTRAN  code  as  shown  below.  Second,  each  complex  twiddle 
factor  multiplication  requires  4  real  multiplications  and 
2  real  additions.  Third,  the  number  of  real  operations  to 
compute  the  sines  and  cosines  is  added  to  the  butterflies 
and  twiddle  factors  to  give  the  total  opei'ations  count  for 
each  algorithm. 

For  the  case  of  N-2m  it  was  shown  in  the  radix-2 
Section  3.2.1  that  the  radix-2  butterfly  can  bo  computed  with 
4  real  additions  and  no  multiplications.  This  radix-2  butter¬ 
fly  can  be  computed  with  4  real  additions  and  no  multiplica¬ 
tion:;.  This,  radix- 2  FFT  does  not.  eliminate  all  multiplica¬ 
tions  by  wP .  Therefore  each  radix-2  butterfly  is  multiplied 
by  a  complex  twiddle  factor  as  shown  in  Figure  3.8.  For  this 
particular  radix-2  FFT  t  lie  number  of  twiddle  factors  equal 
tin-  number  of  butterflies.  Combining  all  sources  of  real 
ojHU.it  ions  for  l.!ie  radix-2  FFT  gives  a  total  of: 


f>6 


real  mult 


(i!  butterflies) 


( v  mult  per  butterfly)  * 
r  4  C  Iw.i.dal  e  factor..)  (3.92) 

+  4  (;i  typer,  of  twiddle  factors) 

Substituting  the  appropriate  values  for  the  rndix-2  gives: 

111  1-1 

real  mul  t  -  (0)  *  (mN/2)  +  4*(mli/2)  -I  4  *  (  2  2K  L ) 


m 

=  2mN  +  4 

k=l 


,k-i 


k--l 


(3.93) 


Likewise  for  the  number  of  real  additions: 

real  adds  =  ( ji  adds  per  butterfly)  *  (it  butterflies) 


real  adds 


+  2  (r  twiddle  factors) 

+2  (#  types  of  twiddle  factors) 

m  k-1 

4  *  (mN/2)  +  2*  (mN/2)  -l  2  *  (  T.  2K  x) 


m 

3mN  +21 
k=l 


,k-l 


k=l 


(3.94) 


(3.95) 


For  the  radix-p  FFTs  where  p  is  an  odd  prime  it  has 

been  shown  by  Singleton,  1969,  that  these  butterflies  can 

2 

be  evaluated  using  (p-1)  real  multiplications.  The 
FORTRAN  coded  radix-3  and  radix-5  in  Appendices  13  and  D 
require  4  real  multiplications  and  12  additions,  for  radix-3 
butterflies  and  16  real  multiplications  and  30  additions 
for  radix-2  butterflies.  Using  these  in  Eqs  (3.87)  and 
(3.91)  yields  the  total  real  operations  for  the  radix-3  as: 
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real  mull  -  (4  mull  per  butterfly)  *  mN/3 
+  4  (iiiK  ( 3-1 )  /  3  -  (N-l )  )  +  2N 
=  4mN/3  +  8 mN/3  -  4 (N-l)  +  2N 

=  4 r.\N  -  4  (N-l)  +  2N  (3.96) 

real  adds  =  (12  adds  per  butterfly)  *  mN/3 
+  2  ( niN  ( 3  - 1 )  /  3  -  (N-l))  +  5N 
=  12mN/3  +  4mN/3  -  2  (N-l)  +  5N 
=  16mN/3  -  2  (N-l)  +  5N  (3.97) 

Similarly  the  real  operations  count  for  the  radix-5  FFT 
becomes : 

real  mult  =  (16  mult  per  butterfly)  *  mN/5 
+  4 (mN ( 5-1) /5  -  (N-l))  +  2N 
=  16mN/5  +  16mN/5  -  4  (N-l)  +  2N 
=  32mN/5  -  4 (N-l)  +  2N  (3.98) 

real  adds  =  (30  adds  per  butterfly)  *  mN/5 
4  2 ( mN ( 5 - 1 ) / 5  -  (N-l))  +  5N 
=  30mN/5  +  8 mN/5  -  2  (N-l)  +  5N 
=  38mN/5  -  2  (N-l)  +  5N  (3.99) 

The  results  of  Eqs  (3.92)  through  (3.99)  arc  given  in  Table 
3.1  for  N  between  8  and  16,000.  This  table  also  summarizes 
the  possible  values  of  N  for  the  fixed  radix-2,  3,  and  5 
FFTs . 

3.3.5  Real  Operations  Count  for  the  Radix-3  FFT 
Using  tin-  Complex  Cube  Root  of  Unity.  This  algorithm 
represents  an  alternative  to  the  conventional  radix-3  FFT. 
It  is  shown  in  this  section  that  selective  use  of  this 
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TABLi :  3.1  , 

•  * 

bi'.ai.  opri'A'i  on"/:’  r’np  1  •/ 


N 

Radix 
— - - % - 

Mu  1 1 i p 1 i c a t i on s 

"'Audi  1  ion:; 

Triu  I, 

8 

.  23 

76 

86 

3 

9 

32 

58 

125 

1 

16 

24 

188 

222 

4 

25 

52 

274 

457 

1 

27 

33 

274 

515 

1 

32 

25 

444 

542 

5 

64 

26 

1020 

1278 

6 

81 

34 

1138 

1973 

1 

125 

53 

2154 

3227 

1 

128 

27 

2300 

2942 

7 

243 

35 

4378 

7211 

1 

256 

28 

51.16 

6654 

8 

512 

29 

11260 

14846 

9 

625 

54 

14754 

20877 

1 

729 

36 

15142 

25517 

1 

1024 

210 

24572 

32766 

10 

2048 

211 

53244 

71678 

11 

2187 

37 

56866 

8  8  211 

1 

3125 

55 

93754 

1 28127 

1 

4096 

2  3  2 

H46C  ; 

1 55646 

12 

6  561 

38 

19685 4 

2  9  9  6  2  1 

1 

8192 

213 

245/56 

3  3  5  H  7  0 

13 

1562  5 

5  6 

568754 

7  5  5  57  7 

1 

algori  hm  can  irilucr  the  number  of.  real  operation:;  depend.!  nq 


on  t.lie  sequence  li-ngih 

The  radi:;-3  ITT  in  the  R(u)  field  lias  four  sources 
or  real  multiplications  (where  N--3m)  : 

1.  2mn/3  -  (N-l)  complex  twiddle  factors  derived 
in  Section  3.3.3. 

m  i-1 

2.  Conversion  from  complex  to  R(u)  of  d  2(3  -  1) 

i  =  2 

twiddle  factors  derived  f rom  FORTRAN  code  in 
Appendix  C. 

3.  Conversion  of  complex  array  of  length  p  to  the 
R(u)  field  derived  from  the  FORTRAN  code. 

4.  Conversion  of  R(u)  array  length  N  back  to  the 
complex  field  derived  from  the  FORTRAN  code. 

The  radix-3  in  R(u)  lias  five  sources  of  real  additions: 

1.  mn/3  butterflies  derived  in  Section  3.3.3. 

2.  The  four  sources  of  real  multiplies  listed  above. 
Based  on  the  FORTRAN  code  in  Appendix  C,  there  are  three 
real  multiplications  per  complex  twiddle  factor,  two  per 
twiddle  factor  conversion,  two  per  conversion  from  complex 
to  the  R(u)  field,  and  two  per  conversion  from  R(u)  to  the 
complex  field.  Condensing  the  above  into  an  equation  for 
real  mul tipi. i  cations  yields: 


m  i-1 

real  mult  -  3(2mN/3  -  N  +  l)  +2  7  2(3  -  1)  +  4N  (3.100) 

i=2 

There  are  14  real  additions  per  butterfly,  six  per 
twiddle  factor,  one  per  twiddle  factor  conversion,  one  per 
conversion  to  H(u)  array,  and  one  per  conversion  to  complex 
array.  Expressing  the  total  number  of  real,  additions  as  a 
function  of  the  above  yields: 


00 


real  adds  -■  ]  A  j„R/3  1  G(2m\'/3  -  Ni  l) 

rr.  1  -  1 

•!  >:  2(3  -  1)  +  2N  (3.101) 

i-2 

The  results  for  the  mmlx  r  of  real  multiplications  and 
addition::  for  both  radix-3  algorithms  is  given  in  Table  3.2 
for  N=27  to  N"  19683.  because  the  U(u)  radix-3  requires  more 
multiplications  and  additions  lor  N-27  and  81  it  will  always 
run  slower  than  the?  complex  field  radix-3  FFT.  But,  for 
Ni:243  and  higher  the  R(u)  radix-3  may  run  faster  depending 
upon  the  speed  of  additions  relative  to  multipl ications  for 
the  computer  being  used  to  perform  the  FFTs. 

Table  3.2  also  gives  the  "Add  to  Multiply  Ratio" 
required  for  the  R(u)  field  radix-3  FFT  to  run  faster  than 
the  conventional  radix-3  FFT.  (The  ratio  is  the  difference 
in  the  number  of  multiplies  divided  by  the  difference  in 
the  number  cf  additions.)  For  the  case  of  K=729,  a  multiply 
operation  must  take  3.77  times  longer  than  an  addition 
before  the  R(u)  field  radix-3  can  run  faster  than  the  com¬ 
plex  field  radix-3.  This  means  that  prior  to  select  is- 
either  of  the  algorithms  the  relative  cost  s  of  add.it  bn;. ■ 
to  multiplications  must  be  known  as  well  as  the  length  of 
the  data  sequence. 

3.3.6  Memory  Requirement s  for  F j xed  Radi x  FFTs .  A 
major  consideration  for  selecting  a  particular  I ’FT 
algorithm  is  the  sequence  .length  and  memory  required  to 
execute  the  subroutine  relative  to  the  memory  available 
in  the  computer.  For  this  reason  the  memory  requirements 


(>1 


COMPARISON  BKTV'F.FN  COMRt.FX  AND  R(u) 
RADIX- 3  l-'l-'T  I ’OR  RI’AL  ORiJRATI  ONS* 


Complex  Radix-3  R(u)  Rndix-3  Add  to 


N 

Rea]  Mult 

Real  Adds 

Real  Mull 

Rea !  Adds 

Mul t  Rat 

2  7 

220 

380 

232 

624 

NA 

81 

9  76 

156  8 

1284 

2562 

NA 

243 

3892 

5996 

314  0 

9796 

5.05 

729 

1  4  584 

21872 

10912 

35714 

3.77 

2187 

524  92 

77276 

37152 

126.108 

3.18 

6561 

1837.12 

266816 

124628 

435202 

2.85 

19683 

629300 

905420 

413308 

1476212 

2.63 

* 

Does  not 

include  computing  sine 

and  cosine 

terms 
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for  the  radix-?,  3,  and  5  FFT::  i s  oi  von  here  an  .1  function 
of  sequence  length  N.  The  program  memo) y  and  data  array 
storage  requirements  for  each  algorithm  are  enumerated 
below. 

The  program  memorj  required  by  each  routine  was 
determined  from  a  "load  map”  generated  by  the  command  MAP, 
PART.  The  array  storage  requirements  wore  determined  by 
inspection  of  the  DIMENSION  statements  in  the  FORTRAN  code 
for  each  subroutine  listed  in  Appendix  A  to  D.  The 
results  are: 

FFT  Program  Arrays 

Radix-2  108  2N 

Radix-3  301  4N  +  M  +  30 

Radix-3  in  R(u)  396  4 N  +  M  +  30 

Radix-5  458  4N  +  M  +  30 

The  memory  arrays  required  for  each  algorithm  as  a 
function  of  N  are  listed  in  Table  3.3.  The  program  memory 
was  not  included  because  it  is  dependent  on  machine  word 
size  which  varies  from  machine  to  machine. 

3.4  Mixed  Radix  FFT  Algorithms 

Up  to  this  point  only  fixed  radix  PFTs  ha  seen 
discussed.  Explanation  and  programming  f. .  the  special 
cases  where  N-2m  or  3ln  or  5™  arc  simpler  than  the  general 
case  of  N=p-^P2 . .  .pm,  and  for  most  applications  the  restricted 
choice  of  values  is  adequate.  However,  when  the  application 
does  not  permit  "zeropacking"  of  the  data  sequence  to  reach 
one  of  the  special  cases  a  wider  choice  of  N  is  needed. 
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I’lXlll)  RADIX  MijMORY  Ki  8  1 1  :i :D 


N 

Memory  Arra 

8 

]  6 

9 

68 

IG 

32 

25 

132 

27 

141 

32 

64 

G4 

128 

81 

358 

125 

533 

128 

256 

243 

1007 

25G 

512 

512 

10  24 

625 

2534 

7  29 

2952 

10  2-1 

20  4  8 

2048 

4095 

2187 

8820 

G  A 


Si.noli'lon  first  pub  1  i  shod  a  mixed  radix  FFT  algorithm 
ill  June  which  It..;:  hcen  widel  y  need  anti  impj  emontcd  on 

Itiiijc  and  email  computers .  Thin  ulgori  thm  in  1  is  tod  in 
Appendix  1'.  (The  International  Mathematical  Scientific 
Library  ( I  MS  1 . )  which  in  avail  able  on  the  WPAFB  C\ DC  Cyber  74 
computer  has  a  mixed  radix  FFT  based  on  Singleton's  work)  . 
Also  the  author  has  written  and  tasted  a  mixed  radix  algorit 
which  is  listed  in  Appendix  E.  The  theory,  digit  reversal, 
real  opera  i.  ions  count,  .and  memory  roaui  romon  ts  for  those 
algorithms  is  discussed  in  the  following  sections. 

3.4.1  Mixed  Radix  Theory .  All  FFT  theory  can  be 
developed  by  representing  a  one-dimensional  sequence  H  as 
several  two  d.i  r.ionci  onal  matrices  and  performing  operations 
on  these  matrices.  Understanding  this  approach  when  exposed 
to  it  for  the  first  time  is  difficult.  For  this  reason  the 
matrix  devolopr.cn4  is  presented  ho  re  and  then  a  specific 
example  of  F-30  is  treated  to  increase  understanding  of  the 
t  i  ran ique . 

Thia  co.  lex  Four  i  •  ■  transform  is  defined  as: 

10-1 

X  !  k )  ;  x  ( n )  exm  ( -  i  3  t.  nk )  (3.102) 

n  0 

For  k  0,  1,  ...,  N- 1  whore  X(k)  and  x(n)  are  both  complex 

valued.  Fa j  (3.102)  c.t,  be  expressed  as  a  matrix 
j  1  !  i  pi  i  ca  i  '  'wi :  X  -  Tx 

Tin  r.i.itrix  ’!'  can  be  dec!  m  ited-in-time  (Cooley  and  Tukey, 

1  ?)(>:>)  or  fi  oquoncy  (Cent  .Ionian  and  Pande,  1906)  to  produce 
equally  efficient,  factoring: 


p  r 


5'2  ’■] 


v.'hcri'  F  .  i  :  ■ 

i 


1  lie  doo.ir  .I  ion  correspond!  ng  to  the  factor 


n  . 


l 


of: 


N  --  n 


--1 


n 


1 


and  P  in  tl  u  .•  permit !  alien  (digit.  reversal)  matrix  (Sing]  elon  , 
1960).  The  natri::  F.  has  onlv  n .  nonzero  elements  on  each 

.L  1 

row  and  column  and  al.ro  be  partitioned  into  N/n.  square 


submatrices  of  diiwnsio:1  n .  ;  .it  is  this  partition  that  is 
the  basis  for  those  (i..i  xou  radix)  c:l  cori  1  lints "  (Singleton, 
1969).  The  matrices  Id  can  be  further  factored  into: 


F.  =  R.  T.  (3.103) 

ill 

where  is  the  diagonal  matrix  of  twiddle  (rotation)  fac¬ 
tors.  Using  these  twiddle  factors  enable  the  trigonometric 

.  ‘i  T1  i  Ti  /  2 

symmetries  and  complex  multipliers  (e.g.,  eJ  ,  c  ,  .... 

c^/K)  to  be  exploited  in  the  FFT  butterflies  and  reduce 
the  number  of  real  operations.  A  specif  ic  docim.ation-in- 
time  example  is  now  considered  wliich  uses  the  above  ideas. 

Given  an  N -point  sequence  for  which  the  N-poi nt  DPT 
is  desired,  the  into,,..  :-  ;;  e..n  be  factored  .into  a  pj-odind  of 
smaller  into,  a  rs  assu.  .  i  ng  X  is  not  crime.  The  successive 
factorisation  of  one  lumber  into  two  can  result  in  any 
possible  combination.  If  N  30,  if  can  be  factored  ns 
5  •  0  and  then  as  5  *3  •2.  The  first  decomposition  is  shown 
in  Figure  1.17  and  is  resented  as  six  f-po.i  nt  I'FTs  followed 
by  five  6-point  DI'Ts .  The  next  stage  of  decomposition  is 
from  5  •  6  to  f  •  3  •  2  and  is  shown  in  Figure  3.10.  Start¬ 
ing  with  the  PIT  expression  in  Fq  (3.99)  the  sequence  can 


6  6 


bo  lacloroi!  into  N:  p  •  q  .0*6  (representing  a  5 
i  ■.  1 1  i: i : : )  n:ul  the  express!  cjj i  bee ;<.u  u.s : 

p-1  rk  q-l  prk 

xo.)  >:  v,  *(, .>■■„»)« 

m-0  r=0 

Now  the  .inner  sum;;  can  bo  I'xprossod  as  the  q-point 

q-1  rk 

Ci  ( k )  =  )!  x  (or-i-in)  v; 

m  r==0  *  q 


by  G 


(3.104) 


Di’Ts  : 


(3.105) 


since 

prk  rk 

-  exp  (- j  3  :  prk/N)  •-  CM})  (-j  2,iprk/pq)  =  W  (3.106) 

Using  p--5  and  q-G  in  Eq  (3.104)  produces: 

4  mk  5  5rk 

X(k)  =  Z  W,„  Z  x(5r+m)  W-,Q  (3.107) 

m=0  r=0  ~J' 


The  inner  sum  in  Eq  (3.107)  is  a  6-point  DPT  which 
can  be  decomposed  into  a  3  by  2  matrix  by  dividing  the 
sequences  x(5r+m)  into  three  sequences,  each  two  points 
long.  The  inner  summation  in  Eq  (3.107)  can  be  represented 
using  the  notation  of  Eq  (3.104)  as: 

p-1  sk  q-1  ptk 

coo  =  ::  wM  q  (pi.-h-.ow',  ( 3.1005 

s= 0  "  t~0  N 

where  M  i : '  now  oqnu 1  p  •  q  ■-  3  •  2.  Subs ti tut  i  n  ;  p  and 
q  yields: 

2  :;k  .1  3tk 

Z  Wr  Z  g(3t  i  s)  Wr 
s-0  (1  1  -  -  0  "  6 


G(k) 


(3.109) 


*  n*'' 


t 


'i’ll  i  c.  i  r:  i  on  in  Kq  (3.100)  <vn  Ik.*  {;ub.*:t  i  1  tiled  .into 


(  n  10'/)  u>  qi\.  : 


A  n:k 


::):  1 


i  w . , .  >:  w,  ).  x(.i  r.t  i 

n  •'> 0  ,,  0  -  2 

mO  s-Q  t.--0 


(3.110) 


where 


r  3t-t-s 


cj  ( 3 1  +  o )  -  x  (  5  (  3t-t  s)  -Ha)  =  x  ( 1 3 11  5s  i  m) 


=  exp  (- j  2  n  •  3tk/G)  =  exp (- j 2 n • tk/2 )  - 

ill  =  0 ,  1  ,  2 ,  3 ,  4 

s  =  0,  1,  2 
t  -  0,  1 

The  complete  flowgraph  is  shown  in  Figure  3. IB  and 
implements  Eq  (3.110). 

3.4.2  D.i  git  Reversal  Al  cior.ithni  (General  )  ■  The 
pcritiu t  a c ion  matrix  P  is  required  because  the  transformed 
result,  is  m  a  digit  reversed  order .  Given  a  factorisa¬ 
tion  of  N  n  n  ,  .  .  .  n.,  n.  ,  the'  Fourier  coefficient,  of 

m  m- 1  21 

X(k)  with: 


k  :•  k  n  .n  .....  n.  -i  ...  i  k..  n.  i  k, 
in  ni-1  in-.’.  1  2  1  1 


JX  "  J  W  I  T  I  I  '•» 

in  ni-1  in- 2 
i  s  f  ou  i  nl  in  1  oca  t  i  on  : 


k  '  ■  k .  n„  n  ,  ...  n  I  k~  n  ,  n ,  ...  n  +  ...  +  k 
.1.  2  3  m  2  3  4  in  ill 


(  3  .  .1  .1  1  ) 


:.  112) 


In  general  the  inforehanue  of  k  with  k*  can  be-  done  "in  place" 
ii  N  is  lactoieu  such  that  (Si ngl el  on ,  1077): 


n .  ~  n 
i  m-i 


(3.113) 


7  0 


i  or  i  less  than  n-.i  .  For  this  factor  i  ng  k  can  he  counted 
in  natural  order  ami  k  '  in  d  i  g  i  L  i'  i.vr::a!  order  an  described 
for  f  i  xed-rudi  x  a  1  cj<  >r  i  (.  hi:i  hi  l  -revei.  r.a  1  . 

To  implement  this  technique  for  mixed  radices  M  is 
factored  into  its  prime  factors  and  the  "square"  factors 
arranged  symmetrically  around  the  "square-free"  factors 
of  N.  For  example,  lot  N=-270  and  be  factored  as: 


3  •  2  •  3  •  5  •  3 

Now  the  reordering,  p,  is  factored  into: 

P  =  P1  P2  (3.114) 

The  reordering  is  "associated  with  the  square  factors  of 
n  and  is  done  by  pair  interchanges  as  previously  described, 
except  that,  the  digits  of  n  corresponding  to  the  square- 
free  factors  are  held  constant  and  the  digits  of  the 
square  factors  are  exchanged  symmetrically"  (Singleton,  1977). 
For  example,  .if: 


N  =  nx  n2  n3  n 

with  n-j  =  n _ ,  n2  =  n^,  and 
the  interchange  associated 
n2,  and  n^  is  given  by: 

k  =  k n ,  nr  ...  n.  +  k 
7  6  5  1 


4  n5  ng  n7  (3.115) 

n_ ,  n^ ,  n^  relatively  prime, 
with  the  square  factors  ,  n 7 , 


r  n .  n  .  ...  n ,  +  k ,  n  .  n n  „  n , 

6  5  4  1  5  4  o  2  1 

n2  n^  +  k2  n^  +  k^  (3.116) 


.i  nl'.erchamu'd  with : 


kl 

n6 

n,  ... 

D 

n.,  +  k„.  nc  n.  ...  n,  +  kr  n, 
i  2  5  4  J.  5  4 

n3  n2  nl 

4- 

n3 

n2  n3  ■ 

i  k.,  n  _  n.  +  k n,  I  k, 

3  2  1  5  1  6 

(3.13  7) 

m 


This  reorder  i  no  1’  in  t  his  oxampl  o  p.l  ncvs  each  el  onion t  of 
X  (JO  in  tiif  ix.)  root  segment  of  length  N/iij  n^,  g  rouped  in 
’Oubr.c'iuoncc;;"  of  n^  n9  con  see  u  five  elements  (Singleton, 
1977).  The  next  reordering  then  finished  the  reordering 
of  each  n^  n^  n^-  subsequences  within  each  M/n^  n^  segment. 

The  above  factorisation  is  used  in  the  Singleton  and 
IMSJ.  mixed  radix  algorithms  and  generates  a  complicated 
FORTRAN  code.  A  simpler  alternative  factorisation  was 
written  by  the  author  and  used  in  his  mixed  radix  algorithm. 
The  simpler  algorithm  requires  an  additional  two  arrays  of 
length  N  to  store  the  intermediate  results  which  detracts 
from  the  algorithms  utility  when  longer  sequence  lengths 
are  transformed.  The  details  of  this  factorization  are 
presented  in  Appendix  E  for  interested  readers. 

3.4.3  Twiddle  Factors .  In  Section  3.4.2  the  factoring 
into  It  was  described  corresponding  to  a  factor  n^.  F^  can 
be  factored  to  give  a  product  R^  T^  whore  the  matrix  'It  is 
one  of  N/it  identical  Fourier  transforms  of  dimension  n^ 

; i n d  It  i s  a  d  i. ago n a  1  i w  j  d die  f  a ctor  ma t  r i x .  The  e  1  erne n t s 
of  It  are  specified  by  the  decim. i tion- i n- frequency  version 
of  tlic  FFT  (Singleton,  1977). 

The  twiddle  factor  matrix  It  multiplies  each  transform 

i 

‘i  ( 51 ) 

T^  of  dimension  n.  by  c  J  where  7,  is  an  angle  from  the 
set: 

0,  7,  27,  ...,  (n. -1)7  (3.118) 

and  7  -  2i;/N.  No  mu  I  t.i  plica!  i  on  is  needed  for  the-  zero 
angle  which  gives  at  most  N(iw-l)/n^  complex  multiplications 
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sD/nd  -  (h'-l) 

(3.119) 

This  re: 

!  L 

i  s  used  i  n  com,; nil  i  n- ! 

the  number  ol  real 

multiplications  a 

nd  additions  required 

by  an  N  length  FFT . 

3.4.4 

Real 

Opera tic>ns  Count  for 

Computing  Sine 

and  Co:ti_no  Pi  f  f.  •>  once  Pcna_t  i  on .  Recnl  1  from  Section  3.1 
that  trigononetr i c  valuer;  used  in  an  ri’T  can  bo  computed 
using  the  difference  equal  .i  ons  : 

cos((k+l)a)  (C  •  cor;  (ka)  -  S  •  sin(ka))  +  cos(ka)  (3,120) 

si  n  (  (k-t-1 )  a)  -  (C  •  nin(ka)  +  S  •  cos  (ka)  )  +  sin(ka)  (3.121) 

where-  a  =  2i:/N  radians 
C  =  -2  si  n2  (a/2) 

S  -  sin  (a) 
cor;  (0)  1 

sin  (  0 )  -  0 

In  the  ease  of  Die  author's  mixed  radix  FFT  the 
di  Hi-i  (''lire  equ  -Lions  are  computed  N  limes  and  the  sine  and 
coi;i;i  team  Its  or.  -d  in  two  lookup  tables.  The  difference 
equ-  i  !  i 1  -a. ;  are  i  \  vn  bv  : 

v:u( :  (  i  )  c  *  we  (i-l)  -  s  *  wks(i-I)  +  wkc(i-I)  (3.122) 

Wi;s  (  I  )  C  *  \.T,;;(r-l)  I  S  *  WKC(J-l)  +  WKS(I-l)  (3.123) 
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IV  jr.  (  5.122)  an-'  ( 3 .  12  3)  inquire  4  real  nul  t*.  i }  >  1  i  cations  and 
10  ir.il  uddi  1.  i  •  sis  i-.ich  ti  me  they  are  coniput  eu.  Given  they 
are  confuted  N  Limes,  the  operat  ion:;  count  in  given  by: 

real  mult  --  4N  (3.124) 

real  adds  -  ION  (3.125) 

The  IMSL  and  Singleton  FFTs  do  not  use  the  sine  and 
cosine  lookup  tables  in  order  to  save  memory  arrays. 

Instead  the  sine  and  cosine  values  are  computed  as  needed 
in  the  FFT  program  resulting  in  an  intricate  FORTRAN  code. 

It  was  determined  from  the  FORTRAN  coded  IMSL  and  Singleton 
FFTs  that  both  utilize  the  same  method  of  computing  the  sine 
and  cosine  difference  equations.  For  this  reason  only  the 
Singleton  FFT  algorithm  was  studied. 

An  algorithm  which  computes  the  number  of  real 
operations  required  was  interpolated  from  "counters"  placed 
in  the  FFT  FORTRAN  code  .in  Appendix  F.  They  provided  the 
number  of  tinea  that  each  section  of  the  FFT  subroutine 
was  used  to  compute  the  sine  and  cosine  values  for  different 
values  of  N.  The  label:;  for  the  counters  are  shown  below 
along  with  the  1  i  in  of  FORTRAN  code  where  they  were 
positioned.  The  lines  of  code  are  shown  in  Appendix  F. 


I?C : 

Counla 

v  for 

the 

radix- 2 

d i. f Terence  equation 

in  1  i ; 

r  ■  ”)  m 

U  ’ !  •  J> 

30  - 

234  0. 

T2CT,: 

Count i 

■r  for 

t  h  e 

rad i x- 2 

s.ino  and  cosine 

libra i 

y  calls  in 

1  i  nes 

2G50  -  2CG0. 

1 4 Cl  : 

Conn  1  i 

:r  for 

t  lie 

rad i x-4 

section  which  com- 

pu  tea 

the  s 

ino  and  cosine  terms  of  the 

W  leg  of  the  radix-4  in  linos  3030  -  3040. 

Refer  to  Figure  3.19  which  shows  the  radix-4 
but  l  <  i  I  1  y  n  owg:  noli. 
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1  4C?  :  Counter  for  the  rndi  x-4  see t  i  on  which  computes 

°  V  3  }■ 

t  ik’  u::,i  cosine  lu  i'iij  oi  the  V.’  unci  W" 

N  N 

3  t'cjr.  of  the  radix-4  butterfly  flowgruph  in 
lines  3140  -  3170. 

I4CL:  Counter  for  rad.ix-4  sine  and  cosine  library 

calls  .i;i  lines  3G90  -  3700. 

IGTF:  Counter  for  the  general  twiddle'  factors  section 

in  linos  4090  -  5000  which  computes  the  sine 
and  cosine  for  the  V.7 3j  Icq  of  the  general  radix-p 
FFT.  K 

ICTFE:  Counter  for  the  general  twiddle  factors  section 

which  computes  the  sine  and  cosine  for  the 
remainder  of  the  radix-p  butterfly  legs  in 
lines  5170  -  5190. 

IGTFL :  Counter  for  the  general  radix-p  sine  and  cosine 

library  calls  in  lines  5290  -  5300. 

Data  was  collected  for  over  70  values  of  N  using  these 

counters.  A  subset  of  the  values  were  the  59  permissible 

sequence  lengths  of  PFA  and  VJFTA.  Based  on  the  results  of 

these  tests  and  study  of  the  FORTRAN  code  FFT  in  Appendix  F 

the  general  expressions  for  these  counters  were  determined. 

Given  that: 

N  =  sequence  length 

NFAC(.i)  --  factors  of  N  (as  factored  by  the  Singleton 
s u brou L i n c ) 

fi  -  nurtboi  oi  factor::  of  N 

KSPANF  -  N/  (Nl-’AC  ( 1  )  *  NFAC(2)  ...  *  (NFAC(i-U) 

then 

I2C.  (KFPAN.  -  3)/2  for  K  ft  PAW .  "  4  and  odd 

l  l  l  — 

I2C^  -  (KSPAN^  -  2)/2  for  KSPAN^  v  4  and  even 
12C.  =  0  for  KftPAN .  <  4 

-i  :i 
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For  the  factors  of  ?.  i  n  N  the  c”or<  ion  for  ]  ?C  tecoraos : 


I2C 


i-  1 


( J  j  )  tor  J.  factors  of  2  in  N 


(3.126) 


The  expression  for  the  number  of  sine  and  cosine  calls 
during  computation  of  a  factor  of  2  is  (KGPAI'b /?<i]  whore 
[•]  represents  truncation  of  the  result  inside  the  brackets. 
Using  the  "truncation"  notation: 


I2CL  =  X  [KSPAN./70] 
i-1  x 


(3.127) 


The  radix-4  section  uses  the  same  notational  conven¬ 
tions  for  KSPAN  and  truncation.  The  expressions  for 
I4C1,  I4C2,  and  1 4 CL  become: 


!4C2i  =  KSPAN i  -  1 


I4CL.  =  IKSPAN./32] 

l  l 


I4C1.  =  I4C2.  -  I4CL . 

ill 


(3.128) 

(3.129) 

(3.130) 


For  all  factors  of  4  in  N  the  expression  becomes 


T4C2 


I4CL 


k 

>:  (KSPAN. -1) 

i=l  1 


k 


I KSPAN. /3 2] 


(3.131) 


(3.132) 


J.--1 

1 4 C 1  =  1 4 C 2  -  I4CL 
whore  there  are  k  factors  of  4  in  N. 

The  general  expressions  for  3 C.TF ,  IGTFF. ,  and  1GTFL  were 
derived  to  be : 


TO'I’FL  . 

l 


'..SI ’AN  .  /3?' 
1 


I  GTF  .  :■  KSPAN. 

1  l 


1GTFL.  -  2 

l 


(3.134) 

(3.135) 
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1)  (NFAC  (i  )  -  1) 


(:■!.]  3f>) 


j cyfi  (Kfi'i’.v-:.  - 

i  j 

The  result  for  the  general  radix-])  section  becomes; : 

k 

IGTF  =  >:  IGTF.  (3.137) 

i-1  1 

k 

IGTFJ,  -  ::  1GTFL .  (3.138) 

i  =  l  1 

k 

IGTFE  -  X  IGTFH.  (3.139) 

i-1  1 

Eqc  (3.124)  through  (3.139)  were  programmed  in  FORTPJYN  and 
then  tabulated  as  a  function  of  N  in  Table  3.4.  These 
results  identically  match  the  tests  conducted  using  the 
counters . 

Examining  the  FORTRAN  code  where  the  counters  were 
located  gives  the  number  of  operations  performed  each  time 
one  of  the  counters  was  incremented.  These  results  arc 
presented  in  Table  3.9  for  all  the  counters.  The  number 
of  real  operations,  sine  and  cosine  library  calls,  and 
exponentiations  can  be  determined  for  all  N  length  sequences 


by  using 

Tables  3.4  and 

3.5.  Tire  general  expression 

s  arc 

given  by 

KAHO 

-  4 ( 12C  T  J  4C 7  l 

IGTF)  A  3 ( 14 Cl )  +  2 (IGTFE) 

(3.340) 

KMUET 

-  4  (I2C  i  J.4C2 

1  IGTF  T  I cm;)  +  6  ( 1 4 C 1 ) 

(3.141) 

KF.XP 

■  2  (  j:  4  c  1 ) 

(3.142) 

3.4 

.  5  Koa  !  ('Pi  rat 

ions  Count  for  Mixed  FFTs . 

The 

real  ope 

ration:;  eoun!  is 

derived  from  the  number  of 

complex 

twiddle  factor:;,  the  lire!  •  r  of  butterflies,  and  the  number 
of  nine  arid  cosine  terms  computed  using  difference  equations. 
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TABJ.r  3 


OPERATIONS 


\:ut;:d  ix 


i:ach  cor 


Co un  ter 

Rea  ] 

Add 

Real 

Mult 

exponen¬ 
tial  i  on 

Sine 

Cal  In 

Con .i  no 
Calls 

1 20 

4 

4 

0 

0 

0 

I2C1 

0 

0 

0 

1 

1 

1 4  CM 

3 

6 

2 

C 

0 

I4C2 

4 

4 

0 

0 

0 

4  4 Cl, 

0 

0 

0 

1 

1 

IC'TF 

4 

4 

0 

0 

0 

iotti: 

2 

4 

0 

0 

0 

I  CTRL 

0 

0 

0 

1 

1 

AD-AiOO  702  AIR  FORCE  INST  OF  TECH  MRI0HT-PATTERSON  AFB  OH  SCHOO— ETC  F/8  12/1 
EFFICIENT  COMPUTER  IMPLEMENTATIONS  OF  FAST  FOURIER  tSanISrmS.  <U) 

DEC  BO  U  D  BLANKEN 

UNCLASSIFIED  AFIT/SE/EE/80D-9  NL 


Given  that  N  is  factored  us: 

N  =  p-j  p2  ...  pin  (3.143) 

the  number  of  twiddle  factors  has  been  shown  (Singleton, 

1969)  to  be: 

m 

E  (N (p .  -  l)/p.)  -  (N-l)  (3.144) 

i=l 

where  m  is  the  total  number  of  factors  of  N.  The  number  of 
butterflies  required  for  an  N  length  sequence  is  given  by: 
m 

I  (N/p. )  (3.145) 

i=l  1 

The  total  real  operations  count  is  determined  by  adding  (a) 
the  number  of  real  multiplications  and  additions  required 
per  butterfly  times  Eq  (3.145),  plus  (b)  the  complex  twiddle 
factor  multiplications  times  Eq  (3.144),  plus  (c)  the  number 
of  additions  and  multiplications  given  by  Eq  (3.140)  and 
(3.141) . 

Assuming  a  complex  multiplication  requires  four  real 
multiplications  and  two  additions  a  general  expression  for 
the  real  operations  count  can  be  determined  for  the  mixed 
radix  FFTs. 

Singleton's  mixed  radix  algorithm  contains  special 

transform  sections  for  factors  of  2,  3,  4,  and  5  as  well  as 

a  general  section  for  other  odd  factors.  This  requires 

that  N  be  represented  as: 

r  s  t  u  ml  m2  mk 

N  -  2  3  4  5  p1  p2  . . .  pk  (3.146) 


02 


'J'he  IMSIj  mixed  radix  F FT  ( FFTCC )  door,  not  have'  a  special 
section  for  factors  of  5  and  uses  the  general  section  to 
transform  these  factors.  The  author's  mixed  radix  FFT  (FFTMR) 
has  sections  for  2,  3,  4,  and  5  but  does  not  have  the  yeneral 
transform  section.  Only  the  detailed  development  of  oper¬ 
ations  count  for  Singleton's  algorithm  is  presented  here 
because  the  ether  two  algorithms  are  subsets  thereof.  The 
general  expressions  for  real  operations  versus  N  are  given 
for  the  other  two  algorithms  in  Appendix  G  and  II. 

The  radix-2  section  of  the  FORTRAN  code  for  Singleton's 
algorithm  is  shown  in  Figure  3.20.  For  factors  of  two  the 
twiddle  (rotation)  factor  complex  multiplications  are  com¬ 
puted  in  this  section  rather  than  the  "general  rotation 
section"  to  reduce  the  array  indexing  required.  Using 
Eq  (3.144)  the  total  number  of  butterflies  is  rN/2  and  from 
Eq  (3.145)  the  total  number  of  twiddle  factors  is  rN/2 
(neglecting  the  -(N-l)  term  which  will  be  subtracted  once 
the  complete  real  operations  count  for  all  factors  has  been 
developed).  The  transform  for  factor  of  2  (refer  to 
Figure  3.20)  is  computed  in  lines  2200-2230  using  4  real 
additions,  if  no  twiddles  are  required,  or  it  is  computed 
in  lines  2450-2500  if  twiddles  are  necessary.  The  general 
expression  for  factors  of  two  becomes: 

real  mult  =  4  (rN/2)  -  2rN  (3.147) 

real  adds  =  4 (rN/2)  +  2 (rN/2)  =  3rN  (3.148) 

The  factors  of  3  section  shown  in  Figure  3.21  performs 
only  the  butterfly  in  this  section  and  uses  the  general 


rotation  (twiddle)  section  to  twiddle  the  data  (the  general 


twiddle  factor  section  is  shown  in  Figure  3.24).  Using 
Eqs  (3.144)  £ind  (3.145)  the  number  of  butterflies  for 
factors  of  3  is  sN/3  and  the  number  of  complex  twiddles  is 
s(2M/3) .  Examining  lines  2760-2870  in  Figure  3.21  shows 
4  real  multiplications  and  12  real  additions.  Each  complex 
twiddle  requires  4  real  multiplications  and  2  real  additions. 
The  expression  for  the  factors  of  3  section  becomes: 
real  mult  =  4(N/3)s  +  4(2/3)Ns 

=  4sN  (3.149) 

real  adds  =  12(N/3)s  +  2(2/3)Ns 

=  16sN/3  (3.150) 

The  factors  of  4  section  in  Figures  3.22a  and  b  include 

the  twiddles  in  the  butterfly  section  to  minimize  array 
indexing.  The  number  of  butterflies  computed  for  t  factors 
of  4  is  tN/4  and  the  number  cf  complex  twiddles  is  t(3N/4) 
from  Eqs  (3.144)  and  (3.145).  From  lines  3210-3320  and 
3540-3570  the  number  of  real  additions  per  butterfly  is  16. 
Every  complex  twiddle  requires  4  real  multiplications  and 
2  additions.  Combining  the  butterfly  and  twiddle  operations 
results  in  the  general  expression  for  factors  of  4: 

real  mult  =  4(3N/4)t  =  3tN  (3.151) 

real  adds  =  2(3N/4)t  +  16(N/4)t 

=  3tN/2  +  8tN/2  =  lltN/2  (3.152) 

The  transform  section  for  factors  of  5  shown  in  Figure 
3.23  computes  the  butterflies  for  the  u  factors  of  5.  There 
are  uN/5  butterflies  and  u(4n/5)  complex  twiddles  based  on 
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Fqs  (3.144)  and  (3.145).  Examination  of  lines  3820-4090 
in  Figure  3.23  shows  1  G  real  juul  Implications  and  32  real 
additions  arc  required  per  butterfly.  Combining  the 
butterfly  and  complex  twiddle  operations  provides  the 
general  expression  for  real  operation  for  factors  of  5: 
real  mult  =  lG(N/5)u  +  4(4N/5)u 

-  32uN/5  (3.153) 

real  adds  =  32(N/5)u  +  2(4N/5)u 

=  8uN  (3.154) 

where  u  is  the  number  of  factors  of  5  in  N. 

The  general  transform  section  for  odd  prime  factors 
is  more  complex  than  the  special  factors  sections.  To 
aid  in  describing  the  number  of  real  operations  a  p-radix 
is  defined  such  that  p  is  an  odd  prime  greater  than  5  with 
an  associated  "mi"  integer  power.  The  real  operations 
count  for  the  general  section  does  not  include  additions 
associated  with  array  indexing  nor  does  it  count  multi¬ 
plications  and  additions  needed  to  recursively  compute  the 
sine  and  cosine  terms. 

Based  on  the  FORTRAN  program  for  the  odd  factors  shown 
in  Figure  3.24a  and  b  there  arc  five  sources  of  real 
operations  for  each  p^  factor.  The  first  source  shown  in 
lines  4310-4360  is  computing  the  (p^-l)/2  complex  multi¬ 
pliers  for  the  butterfly  legs  which  require: 

real  mult  =  4(p,-l)/2  =  2(pi~l)  (3.155) 

real  adds  «  2(pi~l)/2  =  (pi~l)  (3.156) 
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Figure  3.24a.  General  Factor  Section  of  Singleton's  FFT. 
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. ..  •  conpiv  ::  :  .a  I  lii.  Li'.-i  .:  oouputwd  only  once  for  each 

, ;  ;  aol'.ir  ■  .  ,  o .  ■  i  -  /  ion  ..  2  o  /  *4,  the  mcLor  7  requires 

( 7  —  i  ) /2  complex  nul  tipliors.  If  N-19G  =  7  *7*4  there  ^rc 

still  only  (7-1) /2  complex  multipliers  needed. 

The  second  source  of  real  operations  is  produced  by 
computing  the  butterfly  transmittances  which  require  only 
real  additions.  From  Eq  (3.145)  there  are  (mi)N/p^ 
butterflies  required  for  the  (mi)  factors  of  .  For 
each  butterfly  there  are  (p^-l)/2  transmittances  which 
require  only  real  additions.  Examining  lines  4470-4540 
in  Figure  3.24a  show  that  the  (p^-l)/2  transmittances 
require  6  additions.  Combining  these  results  produces 
the  general  expression  for  the  real  additions: 
real  adds  =  (6 (p^-1) /2) (mi) N/p^ 

=  3N(mi) (pi-l)/pi  (3.157) 

The  third  source  of  operations  is  produced  by  the 
2 

(p^-1)  /4  butterfly  transmittances  which  require  real 
multiplications  and  additions.  Lines  4510-4750  in 
Figure  3.24b  show  there  arc  4  real  multiplications  and 
4  real  additions  needed.  Combining  this  with  the  number 
of  transmittances  and  butterflies  gives: 

real  mult  =  4 ( (mi) N/p^ )  ( (p^-1)  2/4 ) 

=  (mi)N(pi-l)2/pi  (3.158) 

real  adds  =  (mi) N (p^-1 ) 2/p^  (3.159) 
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f.  .  .urcv  r  .  roi!i  computing  the  (j>^-])/2 

i,  i  y  i  ..cn  ti.il)  IJ/po  butterfly.  Examining 

lines  4300-1230  show  that  this  function  requires  4  real 
additions.  Corabining  these  results  give  the  total  as: 
real  adds  =  (  (mi  )  M/p^ )  4  (p^-1 ) /2 

=  2(mi)N(pi-l)/pi  (3.160) 

The  final  source  of  real  operations  is  shown  in 
Figure  3.24b  lines  5120-5140  which  performs  the  complex 
twiddle  multiplications .  From  Eq  (3.144)  there  are 
(mi)N (p^-1) /p^  complex  twiddles  which  provide  the  general 
expression: 

real  mult  =  4 (mi) N (pi~l) /p .  (3.161) 

real  adds  =  2 (mi)N (p^-1) /p^  (3.162) 

Combining  Eqs  (3.145)  through  (3.162)  give  the  expression 
for  the  real  operations  in  the  general  odd  factors  section: 

^  2 
real  mult  -  Z  2(p.-l)  +  (mt)N(p.-l)  p. 

i=l  1  1  x 

+  4  (nd)N(pi-l)/pi  (3.163) 

k 

real  adds  =  Z  ((p.-l)  +  3N (mi) (p • -1) /p • 
i=l  1 

+  (mi)N (pi-l) 2/pi  +  2 (mi)N (pi-l)/pi 

+  2 (mi)N (pi-l)/pi) 
k 

=  I  (p.-l)  +  7N  (mi)  (p. -l)/p. 
i=l  1  1  1 

+  (mi)N(pi-l)2/Pi  (3.164) 
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Assuming  that  the-  sequence  can  be  factored  into 
N  =  2  3  4  5  p0  .  .  .  p,_  the  expressions  for  the 

total  number  of  real  operations  can  be  written  using 
Eqs  (3.140)  through  (3.164)  as: 

real  mult  =  2rN  +  4sN  +  3tN  +  32uN/5 

k  2 
+  I  ( 2  (p . -1 )  +  (mi)N  (p.-l)  Vp. 

i=l  1  1  x 

+  4 (mi)N(pi-l)/pi)  -  4  (N-l )  +  KMULT  (3.165) 


real  adds  =  3rN  +  16sN/3  +  lltN/2  +  8uN 
k 

+  Z  ((p.-l)  +  7N(mi) (p.-l)/p. 
i=l  1  li 

+  (mi)N(pi-l) 2/pi)  -  2 (N-l)  +  KADD  (3.166) 

Notice  that  Eqs  (3.165)  and  (3.166)  have  the  corresponding 
4 (N-l)  and  2 (N-l)  real  operations  subtracted  from  the  total 
multiplications  and  additions  because  the  first  stage  of  any 
FFT  decimation-in-time  does  not  require  the  "twiddle  factors" 
(likewise  with  the  last  stage  of  an  FFT  decimation-in¬ 
frequency)  .  These  equations  also  include  KADD  and  KMULT 
which  are  tne  real  operations  required  to  compute  the 
recursive  sine  and  cosine  difference  equation. 

Similar  expressions  and  derivations  were  performed 
for  the  IMSL  FFT  and  the  author's  FFT  but  due  to  the 


redundancy  they  were  derived  in  Appendices  G  and  E 
respectively.  The  general  expression  for  real  operations 

required  by  the  IMSL  mixed  radix  FFT  (where  N  =  2r  3s 

ml  m2  mk .  .  .  , 

Pi  p2  . . .  pk  )  xs  given  by: 
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real  mult  -  2rN  +  -tsh  +  JtN 
k 

+  7  ( 2  ( p  •  - 1 )  +  4  (mi  )  N  (p  . -1 )  /p  . 

i=l  1 

+  (mi)N(pi-l) 2/pi)  -  4 (N-l )  +  KMULT  (3.167) 

real  adds  =  3rN  +  6s>;  +  ltM/2 

k 

+  £  ((p.-l)  +  8 (mi) N (p. -1 ) p. 

i=l  1  1  x 

+  N(mi) (pi-l) 2/Pi)  ~  2 (N-l)  +  KADD  (3.168) 

where  KMULT  and  KADD  are  the  multiplies  and  adds  needed 
to  compute  the  sine  and  cosine  terms.  The  general  expression 
for  real  operations  required  by  the  author's  mixed  radix 
FFT  (where  N  =  2r  3s  4*"  5U)  is  given  by: 

real  mult  =  2rN  +  4sN  +  3tN 

+  32uN/5  -  4 (N-l )  +  4N  (3.169) 

real  adds  =  3rN  +  16sN/3 

+  lltN/2  +  8uN  -  2  (N-l)  +  ION  (3.170) 

The  real  operations  count  for  Singleton's  mixed  radix 

FFT  is  shown  for  N  _  200  in  Figures  3.26  and  3.27.  The 

operations  count  plotted  includes  only  the  additions  and 

multiplications  for  the  butterfly  and  twiddle  factors  in 

2 

order  to  demonstrate  the  N  "upper  bound"  and  the  N  log2  N 

2 

"lower  bound" .  The  N  upper  bound  occurs  in  the  mixed 

radix  FFTs  when  a  prime  number  must  be  transformed.  The 

N  log2  N  lower  bound  is  reached  when  N=2m.  In  between  the 
2 

N  and  N  log2  N  bounds  there  are  other  "bounds"  which  are 
observed  in  Figure  3.25.  The  dashed  lines  represent  numbers 
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Figure  3.25.  Multiplications  vs  N  for  Singleton' 


which  arc-  not  primes,  but  arc  not  highly  factorable*  either. 


The  dashed  line*  approaches  IJ  loi^  N  as  !I  becomes  more 

factorable . 

The  relative  efficiency  of  radix  2,  3,  4  and  5  FFTs 
is  observed  in  Figures  3.27  and  3.28.  These  figures  plot 
real  operations  counts  for  the  mixed  radix  FFT  for  N  less 
than  250  (where  N  is  divisible  by  2,  3,  4  and  5  only)  and 
annotate  the  integer  powers  of  2,  3,  4  and  5.  Notice  that 
the  fixed  radix-2  and  4  provide  the  "lower  bound"  and  the 
radix-3  and  5  provide  the  "upper  bound"  on  the  number  of 
real  operations  which  shows  that  integer  powers  of  2  and  4 
require  the  least  number  of  real  operations  and  radix-3 
and  5  the  most.  Other  combinations  of  factors,  i.e., 
N=120=5*4*3*2,  have  real  operations  counts  which  fall 
between  the  "bounds". 

3.4.6  Memory  Requirements  for  Mixed  Radix  FFTs . 

As  in  the  case  of  fixed  radix  algorithms,  a  major  consider¬ 
ation  in  selecting  a  particular  mixed  radix  algorithm  is 
the  memory  required  to  execute  the  FFT  subroutine  given  the 
memory  storage  limitations  of  the  computer  to  be  used.  The 
memory  requirements  for  the  three  mixed  radix  FFTs  is  given 
here  as  a  function  of  the  sequence  length  N.  Each 
algorithm  has  program  and  memory  array  requirements  which 
are  listed  below. 

All  the  algorithms  were  compiled  on  the  CDC  Cyber 
system  at  AFIT  and  the  program  memory  required  by  each  sub¬ 
routine  was  determined  from  a  "load  map"  generated  by  the 
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Figure  3.27.  Multiplications  vs  N  for  Multiples  of  2,3,4  and 


Additions  vs  N  for  Multiples  of  2,3,4,  and 


command  MAP,  PART.  This  load  map  gives  the  size  of  all 
programs  used  during  execution.  The  array  storage  require¬ 
ments  were  determined  from  the  FORTRAN  coded  programs  and 
reference  material  provided  with  the  IMSL  and  Singleton  FFT 
subroutines.  The  general  expression  for  memory  require¬ 
ments  for  each  FFT  subroutine  (as  a  function  of  N)  is 
given  below. 

The  subroutine  written  by  the  author  requires  899 
words  of  program  memory.  This  subroutine  (FFTMR)  also 
requires  the  "calling”  program  to  dimension  6  arrays 
(A,  B,  AT,  BT,  WKS,  and  WKC)  to  length  N.  (Use  of  these 
arrays  is  explained  in  Appendix  E) .  This  gives  the  total 
memory  array  required  as: 

FFTMR  memory  =  6N  (3.171) 

The  mixed  radix  subroutine  written  by  Singleton 
(FFTSNG)  requires  1100  words  of  program  memory.  Four  arrays 
(AT,  BT,  CK,  SK)  are  dimensioned  to  equal  the  maximum  prime 
factor  of  N.  If  there  are  no  prime  factors  greater  than  5 
these  arrays  may  be  reduced  to  1.  A  fifth  array  (NP)  is 
dimensioned  to  at  least  one  less  than  the  product  K  of  the 
square-free  factors  (see  Glossary)  of  N.  If  N  contains  at 
most  one  square-free  factor  this  array  can  be  reduced  to 
M  +  1  where  M  is  the  maximum  number  of  prime  factors  of 
N.  Two  more  arrays,  (XR,  and  XI)  are  dimensioned  to  length 
N.  The  total  memory  array  storage  becomes: 

FFTSNG  memory  =  2  •  N  +  4  •  MAXPF  +  (K-l  or  M+l)  (3.172) 
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where 


N  =  Sequence  length 

MAXPF  =  Maximum  prime  factor  of  N 

K  .  =  Product  of  square-free  factors 

M  =  Maximum  number  of  prime  factors 

NOTE:  K-l  or  M+l  is  selected  in  Eq  (3.172)  based 
on  the  number  of  square-free  factors  of 
N  as  described  in  the  preceding  paragraph. 

The  mixed  radix  subroutine  (FFTCC)  provided  as  part 

of  the  IMSL  package  on  the  CDC  Cyber  system  requires  1061 

words  of  program  memory.  A  complex  array  (A)  must  be 

dimensioned  to  length  N  and  two  other  arrays  (IWK  and  WK) 

are  dimensioned  to  length  "IWORD",  where: 

IWORD  =  3  •  M  +  3  +  MAX  (4*M+7+6*K, 

KB  +  1  +  2  •  JK)  (3.173) 

To  define  the  quantities  M,  K,  KB  and  JK  a  prime  factor 

decomposition  of  N  is  required  such  that: 


» -  4  4 


f  2  f 

KT  KT+1 


"KT+JT 


where  each  f^  is  a  prime  number  (other  than  1)  and  f ^  ^  fj 
given  that: 


i,  r  >  KT  +  1 
KT  >_  0;  JT  _>  0 

Then: 


M  =  2KT  +  JT  (3.174) 

is  the  number  of  prime  factors  in  N  and: 


K  =  max  . 

1  <  j  <  KT  +  JT  v  j' 


(3.175) 
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is  the  largest  prime  factor  of  N.  KB  and  JK  are  defined 
as  follows: 

JK  =  1  •  f1  •  f2  ...  Frt  (3.176) 

where  JK  =  1  if  KT  =  0  and 

KB  =  N/ ( JK) 2  -  2  (3.177) 

Once  M,  K,  JK,  and  KB  are  determined  they  are  substituted 

into  Eq  (3.173)  to  determine  the  value  of  IWORD,  the  actual 

work  storage  requirement.  Counting  only  the  arrays  for  the 

work  vectors  (IWK  and  WK)  and  the  data  arrays  (A  and  B) 

gives  the  total  array  memory  required  for  the  IMSL  FFT: 

Memory  =  2  *  N  +  IWORD  *  2  (3.178) 

An  example  of  N=2100  is  used  to  demonstrate  the  use 

of  Eqs  (3.172)  through  (3.178)  in  computing  the  memory  array 

required  by  the  IMSL  and  Singleton  subroutines.  For  N=2100 

2  2 

the  factors  are  2  *5  *3*7  for  which  FFTSNG  memory 

becomes : 

N  =  2100  =  sequence  length 
MAXPF  -  1  -  maximum  prime  factor  in  N 
K  =  3*7  =  21  =  product  of  the  square  free  factors 
M  =  6  =  maximum  number  of  prime  factors 
Using  Eq  (3.172)  the  expression  for  FFTSNG  memory  array 
is  given  by 

2  •  2100  +  4*7  +  (20  or  7)  =  4248  (3.179) 

NOTE:  There  are  two  square-free  factors  3 

and  7,  therefore  choose  20  for  the 
last  term  of  Eq  (3.179). 

If  this  subroutine  were  used  on  the  Cyber  74  computer,  the 
program  memory  is  added  to  the  memory  array  to  give  a 
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total  memory  of: 


JK  =  1  •  fx  •  f2  .. .  fRT  =  2*5  =  10  (3.184) 

and  KB  is 

KB  -  N/(JK)2  -  2  =  2100/100  -  2  =  19  (3.185) 

The  results  of  Eq  (3.181)  through  (3.185)  provide  the 
size  of  the  work  vector  IWORD  given  by  Eq  (3.173). 

IWORD  =  3M  +  3  +  MAX  (4M  +  7  +  6K,  KB+1+2JK) 

=  18  +  3  +  MAX  (24  +  7  +  42,  19+1+20) 

=  21  +  MAX  (73,  40)  =  94 

Substituting  IWORD=72  and  N=2100  into  Eq  (3.178)  gives  the 
memory  array  for  FFTCC  as : 

2N  +  2 IWORD  =  4200  +  94  =  4294  (3.186) 

Using  this  subroutine  on  the  Cyber  74  computer  requires 
1061  words  of  program  memory  which  makes  the  total  memory 
required  equal  to: 


4294  +  1061  =  5355  words  (3.187) 

For  this  length  N=2100  sequence  the  Singleton  FFTSNG  used 
less  memory  (5348)  than  the  IMSL  FFTCC  (5355). 

The  array  memory  requirements  given  by  Eq  (3.172)  and 
(3.178)  are  plotted  in  Figures  3.29  and  3.30  for  N  less  than 
200.  It  is  readily  observed  that  selective  adjustment  of  N 
to  be  highly  factorable  (composite)  minimizes  the  memory 
required  by  subroutines  FFTCC  or  FFTSNG.  As  an  example  of 
how  prime  numbers  increase  the  memory  array  sizes,  consider 
N  =  2099  for  each  algorithm.  For  FFTSNG  the  variables  are 
MAXPF  =  2099,  K  =  2099,  and  M  =  1.  Since  N  =  2099  contains 
only  one  square-free  factor  the  array  NP  can  be  dimensioned 
to  M+l=2.  The  memory  array  for  FFTSNG  becomes: 

2N  +  4  •  MAXPF  +  2  =  12594  words  of  memory  array 
Adding  the  program  memory  of  1100  yields  the  total  memory 
requiied  to  execute  the  FFTSNG  on  the  Cyber  74: 

memory  =  12594  +  1100  =  13694  (3.188) 

For  the  IMSL  FFT  the  variables  are  K  =  2099,  JK  =  1, 

KT  =  0,  JT  =  1,  KB  =  2097,  and  M  =  1.  The  expression  for 
IWORD  becomes : 

IWOJRD  =  3M  +  3  +  MAX  (4M+7+6K,  KB+1  +  2JK) 

=  3  +  3  +  MAX (12605,  2100)  =  12611 
The  total  memory  assuming  execution  on  the  Cyber  74 
system  is: 

2N  +  2 • IWORD  =  2*2099  +  2*12611  =  29420  (3.189) 

which  is  5.5  times  larger  than  the  total  memory  for  N=2100. 
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.29.  Memory  Array  vs  N  (£200)  for  Singleton's  FFT. 
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3.5  Fourier  Transforms  Using  Fa s t  Convolution  Algorithms 


The  paper  by  Cooley  and  Tukey,  1965,  had  a  major  impact 
on  digital  signal  processing  by  stimulating  the  development 
and  wide  use  of  the  FFT.  Recently  several  new  ideas  have 
been  used  to  compute  the  DPT  which  have  impacted  digital 
signal  processing.  In  1968  it  was  observed  by  Rader  that 
computation  of  the  DFT  could  be  changed  to  circular  con¬ 
volution  by  rearranging  the  data  when  N  is  prime.  Mow,  if 
given  a  fast  way  to  do  circular  convolution,  one  has  a  fast 
DFT  method.  Winograd  showed  the  minimum  number  of  multi¬ 
plications  for  circular  convolution  of  primes  and  prime 
power  length  sequences.  He  then  proposed  that  these  high 
speed  prime  power  convolutions  be  "nested"  into  long  trans¬ 
forms  to  minimize  multiplications.  The  Winograd  nested 
algorithm  has  been  studied  and  programmed  (Silverman,  1977; 
McClellan  and  Nawab,  1979;  Zohar,  1979)  for  computing  the 
DFT  of  complex  valued  sequences. 

An  alternative  to  the  Winograd  algorithm  was  proposed 
by  Kolba  and  Parks  and  combined  the  concept  of  fast  convolu¬ 
tion  with  conventional  DFT  techniques  to  give  another 
efficient  DFT  implementation.  Kolba  and  Parks'  prime 
factor  algorithm  (PFA)  uses  the  same  reordering  technique 
as  the  Winograd  Fourier  transform  algorithm  (WFTA) .  The 
original  PFA  (Kolba  and  Parks,  1978)  has  been  modified 
(Burrus  and  Eschenbacher,  1980)  so  it  can  transform  the  same 
sequence  lengths  as  the  WFTA. 
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This  section  presents  the  theory  of  i  he  WFTA  "srnnli-N" 
alqorithms,  the  data  reordering  (which  is  the  same  for  PFA 
and  WFTA),  the  PFA  theory,  the  real  operations  count,  and  the 
memory  array  requirements  for  both  PFA  and  WFTA.  Since  both 
alqorithms  follow  a  similar  development  the  conversion  of  a 
DFT  to  circular  convolution  and  data  reordering  are  only 
presented  once  and  apply  to  both  algorithms. 

3.5.1  Converting  a  DFT  to  Circular  Convolution . 

To  convert  the  DFT  expression  to  a  circular  convolution  the 
DFT  matrix  [W]  must  be  "mapped"  into  the  circular  convolu¬ 
tion  matrix  [W  ] .  The  mapping  between  these  two  matrices, 
c 

and  hence  the  basis  for  the  WFTA  and  PFA  was  developed 
by  Rader  in  1968. 

Rader  showed  that  if  "N  is  prime,  there  is  some 
number  g,  not  necessary  unique,  such  that  a  one-to-one 
mapping  from  the  integers  i  =  1,2,  ...,  N-l  to  the  integers 
j=l,2,  ...,  N-l  is  given  by: 

j  =  ((gX))N  (3.190) 

where  the  notation  ((x))N  implies  x  modulo  A . "  The  example 
of  N=7  and  g=3  usino  the  mapping  of  Eq  (3.190)  gives: 


i 

1 

2 

3 

4 

5 

6 

j 

3 

2 

6 

A 

5 

1 

The  number  g  is  referred  to  as  a  "primitive  root"  in  number 
theory.  The  mapping  of  Eq  (3.190)  provides  the  convolution 
matrix  [W  ]  from  the  DFT  matrix  [W] .  Examples  of  this 
mapping  are  extensively  treated  in  the  references 
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(Silverman,  1977;  Kolba  and  Parka,  1977)  and  are  not 

repeated  in  this  paper. 

A  brief  example  of  using  the  results  of  the  convolu¬ 
tion  matrix  is  presented  to  aid  in  developing  the  small-N 
algorithm  operations  count.  Consider  the  fallowing  3-point 
DFT  written  in  matrix  notation  as: 


x(or 

o 

£ 

o 

3: 

o 

2: 

■x(or 

X(l) 

= 

„0tAt72 

w  w  w 

x(l) 

X  ( 2 )_ 

. .0. T2_ .1 
_w  w  w 

_x  (2)_ 

(3.191) 


4  1 

where  is  assumed  and  =  W^.  The  circular  convolution 
is  given  by: 


X(l) 

■w1  W2' 

"x  ( 1 )" 

X(2) 

vj2  w1 

x  ( 2 ) 

which  provides  X(l)  and  X(2).  Then  the  DFT  in  Eq 
can  be  rewritten  using  Eq  (3.192)  to  give: 


(3.192) 

(3.191) 


X(0)  =  W°(x(0)  +  x(l)  +  x  (2)  ) 

X(l)  =  W°x ( 0 )  +  X(l) 

X (2)  =  W1x(0)  +  X (2 )  (3.193) 

Using  similar  techniques  to  the  one  presented  here,  convolu¬ 
tion  expressions  to  perform  DFTs  have  been  developed  for 
N=2,  4,  5,  7,  8,  9  and  16. 


Ill 


3.5.2  !  ’ '  orderin':  the  Data  Arrays  .  Implement  ing  the 

WFTA  or  the  PFA  into  a  useful  form  involves  making  long 
transforms  from  the  short,  fast-convolution  transforms  for 
2,  3,  4,  5,  7,  8,  9,  and  16.  The  general  idea  is  "to  con¬ 
vert  a  one-dimensional  lencth  M  -  M,  M_  ...  M.  transform 

12  l 

into  a  i-dimensional  transform  requiring  computation  of 
i  shorter  length  transforms  for  k  =  1,  2,  ...,  i." 

(Kolba  and  Parks,  1977).  The  mapping  from  one-dimension 
to  i-dimensions  is  based  on  the  Chinese  Remainder  Theorem 
which  requires  relatively  prime  factors  M2  ••• 

The  example  for  two  mutually  prime  factors  given  by  Kolba 
and  Parks,  1977,  is  presented  here  because  the  mapping  is 
common  to  both  WFTA  and  PFA. 

In  the  DFT : 

N_1  nk 

X(k)  =  I  x (n)  WnK  (3.194) 

N=0 

the  index  n  of  the  input  sequence  is  referred  to  as  the 
input  index,  and  the  index  k  of  the  output  sequence  X(k) 
is  called  the  output  index.  Mappina  from  one-to-two 
dimensions  maps  the  input  index  n  into  a  pair  of  indices 


n2) . 

n^  =  r-^n  mod 

M1 

O 

II 

H 

c 

. . . ,  M1-l 

rl  =  M2 

mod 

M1 

n2  ~  r2n  mo<^ 

M2 

n2  =  0> 

. . . ,  m2-i 

r  2  ~  ^1 

mod 

M2 

The  output  index  is 

=  k  mod  k^  =  0 ,  M^-l 

k.,  =  k  mod  M2  kj  =  Oi  . . . ,  M2-I 
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r«n»r« 


Ti.'-  inwriir  nig  . pi  no  :  t  <  •  m  two-to-ono  dimension  for  the  out¬ 


put  index  is: 


where 


k  =  (s^k^  +  s.,k2)  m0<^  N 


(3.195) 


S1 

1 

mod  M 

and 

s2 

0 

mod 

M, 

X 

S1 

7  0 

mod  M2 

and 

S1  5 

1 

mod 

M2 

While  the  same  inverse  mapping  in  Eq  (3.195)  could  be  used 
for  the  input  index  n,  it  is  more  convenient  (Kolba  and 
Parks,  1977)  to  use: 


n  =  (M2n.  +  M^n2)  mod  N  (3.196) 

When  the  mappings  in  Eqs  (3.195)  and  (3.196)  are  used  the 
DFT  becomes : 

M. -1  M_-l  n~k?  n.k, 

X(k.,k  )  =  -LI  x (n.  ,n9)  W„  *  WM1  1  (3.197) 

1  2  n1=0  n2=0  1  2  M2  M1 

At  this  point  the  WFTA  and  PFA  approach  the  implementation 

of  Eq  (3.197)  differently  as  seen  below. 

3.5.3  The  Winograd  Fourier  Transform.  A  new 

algorithm  for  computing  the  DFT  was  proposed  by  Winograd 

in  July  1975.  The  WFTA  has  properties  such  that  the  number 

of  real  additions  remained  at  the  FFT  level  while  the 

number  of  real  multiplications  necessary  to  evaluate  the 

DFT  was  reduced  (Silverman,  1977).  This  paper  will  not 

derive  the  "small-N"  algorithms.  Readers  interested  in 

derivation  of  the  WFTA  are  referred  to  the  articles  which 

extensively  treat  the  topic  (Winograd,  1976;  Silverman,  1977; 

Kolba  and  Parks,  1977;  Zohar,  1979). 


wa.  .-i 
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Winocjrad's  proof  started  with  the  N  by  N  matrix  with 
e lemcnts : 

ir  ir  mod  f! 

WN  =  WN  =  Q^i.r)  (3.198) 

which  can  be  decomposed  to: 

°N  *  °H  b  (3-1991 

where  IN  if  a  u  by  N  incidence  matrix  with  values  of  0,  1, 
and  -1  only,  is  a  u  by  u  diagonal  matrix,  and  C>N  is  an 
N  by  u  incidence  matrix  (Silverman,  1977) .  The  decomposi¬ 
tion  of  Qn  is  possible  with  large  values  of  u  relative  to 
2 

N  (i.e.,  u=N  ).  Winograd  solved  the  more  difficult  problem 

of  decomposing  QN=°N  DN  given  an  incidence  matrix  which 

2 

has  dimension  u  smaller  than  N  .  VJinograd  applied  field 
theory  to  give  solutions  where  u  approximately  equals  N  for 
small  values  of  N,  where  N  *  2,  3,  4,  5,  7,  8,  9,  and  16 
(Silverman,  1977) . 

Not  only  did  Winograd  prove  the  minimum  multiplication 
count  for  the  above  small-N  DFTs  but  he  also  proposed  a 
special  structure  of  Eg  (3.197)  using  Eg  (3.199).  The  two 
dimensional  transform  in  Eq  (3.197)  may  be  implemented  by 
first  calculating  length  DFTs: 

M_-l  n_k_ 

y(nx,k2)  =  x(n.,n2)W  (3.200) 

n2=0 

and  then  calculating  length  DFTs: 

M. -1  n.k., 

X(kx,k2)  =  aE  y(n1,k2)W  1  1  (3.201) 

n^=0 


114 


Usinq  the  notation  of  Eq  (3.199)  the  short  trans¬ 


form  can  be  written  in  terms  of  the  input  additions  i^  , 

The  length 


output  additions  0^J,  and  multiplications  d^ 

M2  transform  uses  i^,  0^,  and  d  ^ 2  ^  (Kolba  and  Parks, 
1977).  The  Eq  (3.200)  becomes: 


y  (n1,k2! 


V  0<2>  d<2> 

n  k~r  r 
r=0  2 


M„-l 

E  i  ^ '  x  (n.  , n_ ) 
n  rn.  12 
n2=°  2 


(3.202) 


X(k^,k2)  in  Eq  (3.201)  is  a  length  transform  of  y(n^,k^) 
which  can  also  be  written: 


X(k  ,k  )  =  UlI1  0^  d^L)  ^E1  iij;)  y(nlfk2) 

1  2  m=0  klm  m  n1=0  ^1  1  2 

Substituting  Eq  (3.202)  into  Eq  (3.203)  gives: 

*«..*->  =  V  o'1'  d'1'  V  ii1' 

1  2  m=0  klm  m  nx=0  "“l 

v  "2:1  n(2)  - (2)  ^r1  .  (2)  ,  . 

x  E  0.  d  l  x  x  (n.  ,  n~ ) 

r=0  k2r  n2=0  rn2  1  2 


(3.203) 


(3.204) 


The  order  of  summation  may  be  interchanged  to  "nest"  the 
multiplications  in  the  center  which  gives  Eq  (3.204) 
rewritten  as: 


X(k1,k2)  = 


u2-i 

r=0 

0<2> 

k2r 

UlEX  Q(1) 

n  k.  m 
m=0  1 

M^l 

E 

id) 

M--1 

^E 

i  ( 2 ) 

n^=0 

mn^ 

n2=0 

rn2 

d(1)  d(2) 
m  r 


x(n1,n2) 


(3.205) 
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I 


l 


l 


f 


Eq  (3.205)  is  the  form  that  was  implement  od  into  FORT RAM 
code  (McClellan  and  tiawab,  19  79)  and  listed  in  Appendix  II. 

As  an  example  of  the  "nesting"  structure  for  Die  WFTA 
consider  the  case  of  N=3  given  in  Eqs  (3.190)  through  (3.192) 
First,  let 


'x(l)' 

Ml/2  +  M2/2_ 

_X  (2 )_ 

Ml/2  +  M2/2_ 

then  equating  Eqs  (3.206)  and  (3.191)  gives: 


'x(i)' 

Mj/2  +  M2/2_ 

"x  ( 1 )  W1 

+  x(2)W2' 

,  X  ( 2 ) 

_Mx/2  -  M2/2. 

_x  ( 1 )  W2 

+  x(2)WL 

Substituting, 

W1  =  exp(-j2rr/3)  =  -l/2-j(/3/2) 

W2  =  exp(-j4Tr/3)  =  -l/2+j(/3/2) 

into  Eq  (3.207)  provides: 

M1/2  +  M2/2  =  -x(l)/2  -  j(x(l)/3/2) 
-x(2)/2  +  j (x (2 ) /3/2 ) 


M  /2  -  M0/2  =  -x(l)/2  +  j  (x(  1)  /3/2) 


-x(2)/2  -  j (x(2) . 3/2! 


Solving  for  and  M2  gives: 

=  -(1/2) (x(l)  +  x ( 2) ) 
M2  =  -j  (/3/2)  ( x  ( 1 )  -x  ( 2 )  ) 
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(3.206) 


(3.207) 


(3.208) 


(3.209) 


(3.210) 


•  - 


For  the  algorithm  to  be  used  in  Winograd's  algorithm  the 
multiplications  by  W^  =  l  must  be  accounted  for  and  ninimized. 
This  is  accomplished  by  modifying  the  length  3  DFT  to: 


=  x(l)  +  x ( 2 ) 
a2  =  x(l)  -  x ( 2 ) 


a3  = 

x  ( 0) 

+  a1 

(3. 

.211) 

Mi  = 

(-1/2 

-  l)a1  =  - ( 3/2) ax 

M2  * 

- j ( /3/2) ax 

m3  = 

W°a3 

=  a3 

(3. 

.212) 

C1  * 

m3  + 

M1 

X(0) 

"  M3 

X(l) 

"  C1 

+  M2 

X  ( 2 ) 

"  C1 

-  M2 

(3. 

.213) 

Eqs  (3.211)  through  (3.213)  result  in  2  multiplications, 
1  multiplication  by  W° ,  and  6  additions  which  can  now  be 
expressed  in  the  X  =  0-D*I*x  notation  as: 


"x(or 

1 

o 

o 

1 _ 

- 1 

O 

O 

_ 1 

'1  1  1“ 

"xtor 

X(l) 

= 

ill 

• 

1  -3/2  0 

• 

Oil 

• 

x(l) 

_X  ( 2 )_ 

.1  1  -i- 

.1  0  -j  /3/2_ 

r 

o 

o 

t 

h-* 

i _ 

.  x  (  2 )_ 

and  then  rewritten  into  summations  as: 


(3.214) 


X(k) 


u— 1 


N-l 


0,  d  l  x(n) 

_  kr  r  n  rn 
r=0  n=0 


(3.215) 


The  fast  convolution  cases  for  N=2 , 4 , 5 , 7 , 8 , 9 ,  and  16 
were  developed  similar  to  the  method  used  for  N=3  above. 

The  explicit  equations  for  these  cases  provided  the  small-N 
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operations  count  shown  in  Table  3.6  which  i  r;  u.scd  in  com¬ 
puting  the  real  operations  count  an  i  function  o  *'  :'cr 

the  V.TTA. 

3.5.4  The  Prime  Factor  Algorithm  Theory .  An  Alter¬ 
native  to  the  nested  aloorithm  proposed  by  Wi nograd  was 
developed  by  Kolba  and  Parks.  Because  of  the  algorithms 
structure  it  is  called  the  prime  factor  algorithm  (PFA) 
and  uses  a  modified  version  of  Winograd's  high-speed  con¬ 
volution  technique. 

Converting  the  DFT  to  circular  convolution  and 
reordering  the  data  arrays  for  the  PFA  is  identical  up 
through  Eq  (3.197) 
where  W  =  exp (-j 2TT/M-,  )  , 

X 

WM  =  exp  (-j  2tt/M2)  ,  with  and  M2  relatively 
prime . 

The  transform  in  Eq  (3.197)  may  be  performed  by  calculating 
length  M2  DFTs : 


M_-l  n_k~ 

y(n1,k2)  =  5  x(n1#n2)W 

n2  =  0 


(3.216) 


then  calculatincr  M0  lenath  M.  DFTs: 

M.-l 

X (ki , k  2 )  =  ^  y(n^,k2)W  1 

n^=0 


(3.217) 


The  expressions  in  Eqs  (3.216)  and  (3.217)  are  implemented 
as  short  DFTs  instead  of  "nested"  operations  as  shown  in 

Eq  (3.205)  . 
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TABLE  3.6 

SMALL -N  OPERATIONS  COUNT  FOR  VJFTA 


N 


Mult 

Mult  by  W°  Adds 


2 

3 

4 

5 

7 

8 
9 

16 


0 

2 

0 

5 

8 

2 

12 

10 


2 

1 

4 

1 

1 

6 

1 

8 


2 

6 

8 

17 

36 

26 

44 

74 


For  both  algorithm  ::'tructuro  the  ?:mall-N  equations 
:irc  the  same ,  only  the  i  :::p)  oicm  to  t;  i  on  is  different.  In 
the  case  of  the  PFA  structure  the'  rural  l-h  algorithms  are 
modified  to  permit  a  "shift  operation"  instead  of  a  multi¬ 
plication  by  1/2.  For  the  N=3  example  Eqs  f  3 . 2 11 )  through 
(3.213)  are  modified  to: 

a^  =  x(l)  +  x  ( 2 ) 
a2  =  x(l)  -  x ( 2 ) 

a3  =  x (0)  +  ax  (3.218) 

Mx  =  -(l/2)a1 

M2  =  -j(/3/2)a2  (3.219) 

=  x (0)  +  Mx 
X(0)  =  a3 
x(l)  =  C1  +  M2 

X ( 2 )  =  C1  -  M2  (3.220) 

Eqs  (3.218)  through  (3.220)  have  1  multiplication,  1  shift 
(multiplication  by  1/2)  and  6  additions. 

Similar  sma)l-N  DFTs  result  for  N=2 , 4 , 5 , 7 , 8 , 9  and  16 
to  produce  the  operations  count  for  PFA  snall-N  algorithms 
shoe':',  in  Table  3.7  (Burrus  and  Eschenbacher ,  1980). 

(Complex  valued  sequences  require  the  count  in  Table  3.7 
to  be  doubled.)  If  the  implementation  of  the  PFA  does  not 
use  "shifts"  the  multiplication  count  must  be  adjusted  to 
reflect  the  multiplications  by  1/2.  The  original  FORTRAN 
program  written  (Kolba,  1977)  did  not  include  the  factor 
of  16.  Later  modifications  (Burrus  and  Eschenbacher,  1980) 
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PFA  SMALL-::  DFT  OPERATIONS  COUNT 


N  Multiplies  Shifts  Adds 


2 

3 

4 

5 

7 

8 
9 

16 


0 

1 

0 

4 

8 

2 

8 

10 


0 

1 

0 

2 

0 

0 

2 

0 


2 

6 

8 

17 

36 

26 

49 

74 


NOTE:  For  complex  sequences  the  values  in 

the  table  must  be  doubled. 


included  th<  •  factor  of  1  f>  which  n.a-i  5  h<_  !>FA  capable  of 
transforming  uho  same  •  mono.  !  unguis  ,:.s  the  V.'FTA .  It 
should  be  noted  that  neither  FukTkAb  version  implemented 
the  "shifts"  which  increased  the  number  of  real 
nul tip lie at ions . 

3.5.5  Real  Qperat ions  for  WFTA ■  To  use  the  WFTA 
the  N  length  sequence  must  be  factorable  into  R  relatively 
prime  factors  N.  N0  ...  Nn  where  each  factor  corresponds 
to  one  of  the  Winograd  small-N  algorithms  for  2, 3, 4, 5, 7, 8, 

9  and  16.  It  has  been  shown  (Silverman,  1977)  that  the 

number  of  real  multiplications  is  a  function  of  the  factors 

of  N.  To  aid  in  the  development  of  the  number  of  real 

operations  the  following  terms  are  defined: 

Mr  =  number  of  real  multiplications  in  factor  Nr 

A_  =  number  of  real  additions  in  factor  N 
r  r 

Nr  =  rfc^  factor  of  N 

Winograd  proved  that  the  matrix  is  an  MR  by  MR  diagonal 
matrix  with  only  0,  1,  or  -1  for  diagonal  entries  and  0N 

and  IN  are  N  by  and  by  N  incidence  matrices,  respec¬ 
tively.  To  evaluate  the  nested  multiplications  of  D 
(Silverman,  1977)  requires: 

NMULT  =  M1  M2  ...  Mr  (3.221) 

which  is  the  real  multiplications  count  for  real  valued 
sequences.  For  complex  valued  transforms  Eq  (3.221)  must 
be  multiplied  by  2. 
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All  previous  mu  1  t:  1  r>  1  i  ci  t.  i  onr,  counts  (Wino'jrau,  19  76; 

Kolba  and  Parks,  1977;  Silverman,  1977)  use  only  Ec;  (3.221) 
as  the  source  of  real  multiplications  for  the  WFTA..  The 
multiplications  in  Eq  (3.221)  are  all  performed  by  the  MULT 
subroutine  in  Figure  3.31.  Other  real  multiplications  are 
required  in  the  WFTA  for  computing  the  multiplier  coefficients 
and  determining  the  input  and  output  permutation  vectors 
of  the  INISHL  subroutine  in  Figure  3.31. 

The  DFT  multiplier  coefficients  are  computed  in  lines 
1450-1510  of  the  WFTA  listed  in  Appendix  H  and  require: 

real  mult  =  3  *  NMULT  (3.222) 

where  N  MULT  was  computed  in  Eq  (3.221).  Determining  the 
output  permutation  vector  in  lines  2080-2170  requires: 

real  mult  =  4  *  N  (3.223) 

where  N  is  sequence  length  to  be  transformed.  Combining 
Eqs  (3.222)  and  (3.223)  provides  the  number  of  real  oper¬ 
ations  required  for  initializing  the  WFTA.  Subsequent 
transforms  of  the  same  sequence  length  do  not  require 
initialization.  The  first  complex  transform  of  length  U 
using  the  WFTA  requires: 

real  mult  =  2  *  NMULT  +  3  *  NMULT  +  4  *  N  (3.224) 
Subsequent  complex  transforms  require: 

real  mult  =  2  *  NMULT  (3.225) 

Counting  the  number  of  real  additions  is  more  compli¬ 
cated  because  the  factorization  order  of  N  will  change  the 
real  additions  count  (Silverman,  1977).  For  a  given  factor¬ 
ization  of  N  =  Nj  . . .  Nr  the  number  of  real  additions 
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the  WEAVE  1  and  WEAVE 2  subroutines  in  El  ;ur<_-  3.31.  First 
the  real  additions  from  the  "WEAVIls"  can  be  developed  by 
considerin'-)  the  snecial  case  of  N  -  E  -  E  .  .  E.  is  defined 

i  i. 

as  the  "innermost"  factor  and  is  the  "outermost"  factor. 
For  two  factors  of  N  Silverman  has  shown  the  number  of 
real  additions  to  be: 

A (2 )  =  Nx  A2  +  M2  Ax  (3.226) 

(Recall  A2  equal  real  adds  to  evaluate  factor  N2  and  M2 
equal  real  multiplies  to  evaluate  N2-)  Now  consider 
N  =  N2  where  (N^  N2)  is  considered  to  be  the  "inner¬ 
most"  factor.  The  number  of  real  additions  becomes: 

A(3)  =  (Nx  N2)A3  +  M3  A (2 ) 

=  Nx  N2  A3  +  M3  N2  A2  +  M3  M2  A1  (3.227) 

By  iterative  substitution  the  number  of  additions  for 
N  =  N2  N3  becomes: 

A  (4)  =  (Nj_  N2  N3)A4  +  M  A  (3) 

=  NE  NE,  N.  A  .  +  M ,  N ,  EE,  A 
1234  4123 

+  M4  M3  Nx  A2  +  M4  M3  M2  Ax  (3.228) 

Eqs  (3.226)  through  (3.228)  are  used  to  write  a  compact 
expression  for  the  number  of  real  additions  needed  in  the 
WEAVE  subroutines: 
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R  R-l  R 

A  ( R)  =  2  (  :  (  N.)  (;  r)  (  V.  M.)  )  ( 

r~l  j-1  1  •  i"R-rK>  ’ 

The  expression  in  Eq  (3 .229)  represents  only  real  additions 
used  in  WEAVE 1  and  WEAVE 2.  Other  additions  are  required  Lv 
the  II1ISHL  initialization  subroutine  to  index  the  Ei  . 
coefficient  array  and  compute  the  output  index  vector. 

The  DFT  coefficient  array  is  indexed  with  a  J  counter 
in  line  1500  of  the  FORTRAN  WFTA  program  in  Appendix  H. 

This  part  of  the  INISHL  subroutine  requires  NMULT  real 
additions.  The  input  index  array  INDX1  requires  another 
J  counter  in  line  1720  which  uses  N  real  additions.  The 
output  index  array  INDX2  uses  a  J  counter  in  line  2160 
which  uses  N  real  additions.  Also  the  INDX2  computation 
requires  8N  real  additions  in  line  2120. 

Totaling  the  real  additions  in  the  initialization 
subroutine  gives: 

real  adds  =  NMULT  +  10N  (3.330) 

Adding  the  results  of  Eq  (3.330)  to  Eq  (3.229)  gives  the 
total  additions  needed  to  transform  an  N  length  sequence  frr 
the  first  time.  Subsequent  transforms  at  the  same  N  soquer.ee 
length  requires  only  the  number  of  adds  in  Eq  (3.229). 

The  FORTRAN  WFTA  program  written  by  McClellan  and 
Nawab,  1979,  decreased  the  number  of  real  multiplications 
for  N=9  from  13  to  11  while  the  number  of  additions  remained 
constant  at  44.  Modifying  Table  3.6  to  reflect  the  new 
multiply  count  for  N=9  gives  the  McClellan  and  Nawab  real 
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Table  3.8. 

Using  Eqs  <3.220)  and  (  3.330)  wit  : .  T. i : j  1 « •  S .  v.  <j  i  vos;  the 
number  of  real  ■  >j  <'r  i  ‘  i  ons  for  ill  rm.  i  i  ■  WIT/  tuence 

lengths  si.  own  i  r.  i  an .  •  .  . ;  u .  ,  n-  i.:;  1  -a,. 

"REAL  MULTI"  and  "REAL  AUDI"  represent  ‘.he  operations  for 
the  initial  transiorir.  of  length  N.  The  columns  labeled 
"REAL  MULT"  and  "REAL  ADD"  aive  the  operations  count  for 
subsequent  transformations  of  the  sane  sequence  length. 

The  number  of  real  operations  are  plotted  as  a  function  of 
N  in  Figures  3.32  and  3.33.  These  graphs  demonstrate  the 
large  reduction  possible  after  the  WFTA  has  been  initialized 
for  an  N  length  sequence. 

3.5.t  Memory  Requirements  for  WFTA .  The  FORTRAN 
subroutine  WFTA  listed  in  Appendix  H  requires  2348  words  of 
program  memory  when  compiled  for  the  CDC  Cyber  74  computer. 
The  memory  array  requirements  are  given  by: 

XR,  XI,  INDX1 ,  INDX2:  length  N 

COEF,  SR,  SI:  ith  MMU'Ll V,  M.  winch  is 

I  ~  J  4 

the  numb'-r  •  t  l :  !  :  s  reeui  r-'d  by 

the  factors  of  N .  NMULT  is  listed 
in  Table  3.9a  and  b. 

CO 3,  C04 ,  COS,  CO 3 ,  C016,  CDA ,  CDB,  CDC, 

CDD:  Total  of  88 

The  original  version  of  WFTA  dimensioned  INDXl,  1NDX2 ,  COEF, 
SR,  and  SI  to  their  maximum  possible  lengths  of  5040,  5040, 
10692,  10692,  and  10692  respectively.  This  made  the  memory 
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;  TABLE  3 . 8 

|  McClellan  and  nawab's  wfta 

REAL  OPERATIONS  FOR  THE  SMALL-N  ALGORITHMS 


N 

M(N) 

A  (N) 

2 

2 

2 

3 

3 

6 

4 

4 

8 

5 

6 

17 

7 

9 

36 

8 

8 

26 

9 

11 

44 

16 

18 

74 
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.32.  Real  Multiplications  for  WFTA . 


.00  70.00  140.00  210.00  280.00  350.00  420.00  490.00 

*10‘  ^SEQUENCE  LENGTH 


^SEQUENCE  LENGTH 


array  storage  very  large  even  for  the  shortest  sequence 

lengths : 

memory  array  =  2N  +  2*5040  +  3*10692  +  88 

=  2N  +  42244  (3.331) 

The  memory  arrays  INDX1,  INDX2,  COEF,  SR,  and  SI  were 
variably  dimensioned  by  the  author's  version  of  WFTA  in 
Appendix  H.  This  reduced  the  memory  arrays  required  to: 

memory  array  =  4N  +  3NMULT  +88  (3.332) 

The  results  of  Eq  (3.332)  are  listed  in  Table  3.9a  and  b 
for  all  values  of  N.  A  comparison  of  the  memory  required 
by  Eqs  (3.331)  and  (3.332)  is  plotted  in  Figure  3.34  which 
shows  the  drastic  savings  in  memory  storage  by  using  the 
variable  dimensions.  The  "cost"  of  variable  dimensions  is 
more  work  for  the  user  of  WFTA  because  the  dimensions  must 
be  passed  to  the  WFTA  subroutine  using  more  arguments  in  the 
subroutine  call.  The  original  version  required: 

CALL  WFTA  (XR,  XI,  N,  INIT,  IERR) 

The  modified  WFTA  call  is: 

CALL  WFTA  (N,  XR,  XI,  INIT,  IERR,  SR,  SI,  COEF, 

M,  INDX1,  INDX2 ) 

where  M  =  NMULT.  The  increased  complexity  of  the  second 
call  is  worth  the  savings  of  memory  arrays. 

3.5.7  Real  Operations  for  the  PFA .  The  real  operation 
sources  for  the  PFA  are  computed  from  reordering  the  data 
and  performing  the  small-N  DFTs .  The  unscrambling  constant 
which  maps  the  PFA  result  from  arrays  X  and  Y  to  arrays 
A  and  B  requires  N  real  additions  and  no  multiplications. 
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Tho  second  source,  computing  the  small-N  DFTs  using  fast 
convolution,  has  been  proven  (Kolba  and  Park,  1977)  for 


two  factors  ( M^)  to  be: 


U2  +  M2 

V 

(3.333) 

A2  +  M2 

V 

(3.334) 

,)  : 

'  M1M3U2 

+  MlM2u3) 

(3.335) 

■  MlM3A2 

+  m1H2A3> 

(3.336) 

and  for  four  factors  : 


real  mult  =  2  (MjM^M^u^  +  M]_M3M4ua  +  M-^M2M4u3 


+  M1M2M3u4) 


(3.337) 


real  add  =  2  (M^M^  +  M1M3M4A2  +  M1M2M4A3 


+  m1m2m3a4) 


(3.338) 


where  u^  is  the  number  of  multiplications  required  for 
and  A^  is  the  number  of  additions  required  for  . 

Notice  that  complex  data  transforms  have  been  assumed  in 
Eqs  (3.333)  through  (3.338)  and  the  number  of  multiplication 
and  additions  were  multiplied  by  two. 

As  shown  in  the  PFA  theory  chapter  the  small-N 
algorithms  can  be  implemented  by  using  "shifts"  instead  of 
multiplications  by  1/2.  The  FORTRAN  programs  available  do 
not  make  use  of  these  shifts.  Therefore,  the  operations 
count  for  the  PFA  small-N  DFTs  shown  in  Table  3.7  is 
modified  to  produce  Table  3.10.  Using  the  results  of 
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Eqs  (3.333)  through  (3.333),  the  N  adds  required  for  the 
output  mapping,  and  Table  3.10  the  number  of  real  multi¬ 
plications  and  additions  are  listed  for  all  permissible  N 
values  in  Table  3.11a  and  b.  The  corresponding  graphs  in 
Figures  3.35  and  3.36  show  the  multiplications  and  additions 
as  a  function  of  N. 

Even  though  this  FORTRAN  program  did  not  use  a  shift 
to  perforin  multiplication  by  1/2,  incorporating  shifts  into 
the  small-N  DFTs  represents  a  significant  savings  of  real 
multiplications.  The  major  benefit  would  be  in  small 
computers  where  software  multiplies  are  more  costly  relative 
to  additions.  The  benefit  of  performing  multiplications  by 
using  shifts  is  given  in  Table  3.1a  and  b  under  the  PCT 
(percentage)  column.  PCT  was  calculated  by: 

PCT  =  ( (M-MS) *100)/M  (3.339) 

where  M  is  the  number  of  multiplications  without  using 
shifts  and  MS  is  the  number  using  shifts.  The  percentage 
savings  as  a  function  of  N  was  plotted  in  Figure  3.37  for 
all  values  of  N. 

3.5.8  Memory  Requirements  for  PFA .  The  PFA  program 

listed  in  Appendix  I  requires  770  words  of  program  memory 
when  compiled  for  the  CDC  Cyber  74  computer.  The  memory 
array  requirements  are  given  by: 

X,  Y,  A,  B:  length  N 

The  memory  array  required  by  PFA  is  given  by: 

_j.y  array  =  4n 
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TABLE  3.11a 

PFA  REAL  OPERATIONS  AND  MEMORY  COUNT  FOR  N<72 
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TABLE  3.11b 

PFA  REAL  OPERATIONS  AND  MEMORY  COUNT  FOR  N>80 
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.36.  Real  Additions  for  the  PFA. 


Memory  Array  Required  by  PFA. 


I 


|  and  is  listed  in  Table  3,11a  and  b  and  [dotted  in 

1‘  i. '  j  U  JL  O  3,3*-/, 

3.5.9  Summary.  Two  algorithms  which  use  high¬ 
speed  convolution  techniques  have  been  presented.  Doth  use 
the  convolution  for  computing  snall-X  DFTs  ;nd  boch  require 
N  to  be  factored  into  relatively  prime  factors.  This 
particular  factorization  used  the  Chinese  Remainder  Theorem 
and  the  "Sino  correspondence"  to  reorder  the  data  arrays. 
The  theory,  structure,  and  operations  count  was  presented 
in  this  section. 
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Comma r i son  Results  of  i' f f ic icnt 
D_is  ore  tv  Four  i <  r  Tran.;  forms 

4 . 1  Introduction 

Several  fixed  radix  and  mixed  radix  algorithms  have 
been  studied  and  the  number  of  real  operations  and  memory 
count  required  have  been  computed  in  the  preceding  sections 
The  results  from  these  sections  are  compared  and  presented 

here. 

Tradeoffs  and  advantages  of  fixed  radix  and  mixed  radi 
algorithms  are  discussed,  the  justification  for  selecting 
Singleton's  algorithm  over  the  IMSL  and  mixed  radix  FFT 
is  given,  tables  and  graphs  comparing  the  conventional 
mixed  radix  FFT  with  the  fast  convolution  algorithms  (WFTA 
and  PFA)  are  presented  and  advantages  of  each  are  discussed 
This  chapter  concludes  with  an  algorithm  which  selects  the 
most  efficient  algorithm  based  on  memory  available,  machine 
speed,  zeropacking,  and  sequence  length.  A  flowchart  imple 
mentation  of  the  algorithm  is  included. 

The  timing  tests  in  this  section  used  the  Cyber  74 
system  clock.  This  clock  was  accessed  using  the  FORTRAN 
command  SLCOND(CP)  which  provides  a  timer  accurate  to  .001 
seconds.  The  transforms  were  all  performed  using  samples 
from  the  function  e  cos  50Tit  which  has  the  magnitude 
transform  shown  in  Figure  4.1  for  N=625. 
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The  memory  eompar  i  ;;ons  made  in  thin  chapter  are  based 
op  memory  array  routs  i  r<Vs.  '”h>  •  pro-  t  in  rv  -  ;,t  •»  f  p.  •<;  fro:', 

compilation  on  the  Cyber  74  is  not  applicable  to  smaller 
machines  and  would  not  permit  valid  memory  comparisons.  The 
program  memory  required  for  the  Cyber  74  is  -live::  to  show 
the  relative  sizes  of  the  algorithms. 

4 . 2  Conventional  Radix- 3  vs  R(u)  Field  Radix-3 

In  the  previous  chapter  the  real  operations  count  for 
these  two  radix-3  FFTs  was  given  in  Table  3.2.  From  this 
table  the  most  efficient  radix-3  algorithm  can  be  selected 
based  on  machine  speed.  Validation  of  this  table  was  per¬ 
formed  using  the  CDC  Cyber  74  computer  which  has  a  1.1 
multiply-to-aad  ratio  and  test  data) . 

With  a  1.1  multiply-to-add  ratio  Table  3.2  indicates 
that  the  conventional  radix-3  algorithm  is  more  efficient 
for  all  sequence  lengths  shown.  The  timing  results  in 
Table  4.1  verify  this  conclusion. 

4 . 3  Fixed  Radix  vs  Mixed  Radix  FFTs 

In  Sections  3.3  and  3.4  the  real  operations  count  and 
memory  requirements  developed  for  che  fixed  radix  and  mixed 

radix  FFTs.  Using  the  results  from  these  sections  the  real 
operations  count  and  memory  requirements  are  given  in  Table 
4.2  along  with  results  from  timing  tests  conducted  on  the 
CDC  Cyber  74.  This  table  demonstrates  that  Singleton's 
mixed  radix  FFT  (MFFT)  minimizes  the  operations  count  for 
factors  of  2,  3,  and  5  to  the  level  of  the  fixed  radix 
algorithms . 
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TABLE  4 . 1 

RADIX- 3  TIMING  COMPARISON 


N 

Conventions  1 
Radix- 3  Tir.e 

Rtu)  Fie 
Radix- 3  Ti: 

27 

.002 

.003 

81 

.009 

.011 

243 

.026 

.034 

729 

.094 

.117 

2143 

.305 

.393 
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TABLE 


The  program  memory  required  by  o.K-h  algorithm  is  given 
in  Table  -1.3.  The  large  sir  of  the  NFi'7  i ::  a  t  of  th<. 

extra  sections  needed  to  transform  any  length  transform  and 
the  extra  FORTRAN  code  required  to  perform  multi-variate 
transforms.  None  of  the  other  FFTs  are  capable  of  performir. 
multi-variate  transform"  without  a  significant  amount  of 
additional  user  programming.  Singleton's  MFFT  car.  perform 
up  to  a  tri-variate  transform,  however,  this  additional 
flexibility  is  a  disadvantage  on  memory  limited  computers 
when  performing  single-variate  FFTs. 

The  fixed  radix  and  mixed  radix  FFTs  are  roughly 
equivalent  in  efficiency.  The  fixed  radix  FFTs  offer  a 
memory  savings  over  the  MFFT  for  all  radix-2  transform 
sequence  lengths  shewn  in  Table  4.2  and  some  of  the  radix-3 
and  5  transform  lengths.  The  main  advantage  the  MFFT  offers 
is  the  capability  to  transform  any  length  sequence  N  while 
the  fixed  radix  algorithms  are  limited  to  integer  powers 
of  2,  3,  and  5. 


4.4  Mixed  Radix  F FT  Comparison :  IMSL  vs  Sinol ot on 

In  Chapter  3  and  Appendix  G  the  real  operations  and 
memory  required  for  the  IMSL  and  Singleton's  mixed  radix 
FFTs  were  derived  as  a  function  of  N.  Those  two  algorithms 
are  now  compared  on  the  basis  of  real  operations  and  memory 
and  the  best  alaorithm  selected. 
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TABLE  4 . 3 

PROGRAM  MEMORY  REQUIRED  BY  FFTs 


Program  Memory 


Radix- 2 
Radix- 3 


Radix- 5 


Singleton's  Mixed 
Radix 


1100 


I 


The  expression  for  real  mul  t  ipl  i  e  1 1  ion:,  am;  lddit.ions 
d>v<  doped  ‘or  Singleton's  !•'!■"!’  is  s  si  •  t  >-  .  ■  t  >  >o!  fro;-  *  h-  •  iMyi. 

FFT  expression  for  real  operations  to  show  the  extra  oper¬ 
ations  required  by  IMSL.  Recall  that  both  Singleton  and 
IMSL  versions  of  the  FFT  compute  sine  and  cosine  using  the 
difference  equation  of  Section  3.1.  Both  implement  the 
sine  and  cosine  computation  similarly  and  require  the  same 
number  of  real  operations  to  compute  them. 

Assuming  that  N  can  be  factored  as: 

...  *5 r  .s  .  t  r u  ini  mb  .  .  .  . 

N  -  2  3  4  5  Pl  ...  P]c  (4.1) 

the  difference  in  real  multiplications  between  IMSL  and 

Singleton's  becomes: 

delta  multiplies  =  [IMSL  multiplication  expression] 

-  [Singleton  multiplication  expression] 

delta 

multiplies  =  [2rN  +  4sN  +  3tN  +  8  +  32(u)N/5 
k 

+  E  (2(p  -1)  +  4 (mi)N(p.-l)/p. 
i=l  1  11 

+  (mi)N(Pi-l) 2/pi)  -  4N-1)  +  KMULT! 

-  [ 2rN  +  4sN  +  3tN  +  32uN/5 

k  2 
+  E  (2 ( p . - 1 )  +  (mi)N(p.-l)  /p. 

i=l  1  11 

+  4 (mi)N(Pi-l)/Pi)  -  4 (N-l)  +  KMULT] 

=8  (4.2) 

For  large  values  of  N  the  difference  in  multiplications  is 
negligible . 
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The  difference  in  real  additions  is  derived  from: 
do  1  ta 

adds  =  [ IMSL  addition  expression! 

-  [Singleton  addition  expression] 

delta 

adds  =  [3rd  +  6sN  +  15tN/2  +  4  +  48(u)N/5 
k 

+  E  ((p.-l)  +  8 (mi)N (p. -l)/p . 

i=l  1  11 

+  N(mi) (pi-l ) 2/pi)  -  2 (N-l )  +  KADD] 

-  [  3rN  +  1 6sN/3  +  lltN/2  +  8uN 

k 

+  E  ((p.-l)  +  7N(mi) (p.-l)/p. 

i=l  1  11 

+  (mi)N(pi-l) 2/p±)  -  2 (N-l)  +  KADD] 

=  2sN/3  +  2tN  +  8uN/5  +  4 

+  N(p.-1)/Pi  (4.3) 

The  results  from  Eqs  (4.2)  and  (4.3)  demonstrate  that 
the  IMSL  has  approximately  the  same  number  of  real  multi¬ 
plications  but  requires  significantly  more  additions  than 
Singleton's  mixed  radix  algorithm.  Based  on  these  results 
and  because  the  data  reordering  for  the  two  subroutines 
is  the  same,  the  Singleton  FFT  is  the  most  efficient  of  the 
two  subroutines.  This  conclusion  was  confirmed  by  timing 
tests  on  the  CDC  Cyber  74  computer  at  AFIT.  The  results 
are  shown  in  Table  4.4  for  selected  sequence  lengths. 

The  memory  array  required  for  each  of  the  algorithms 
was  derived  in  the  preceding  chapter.  Those  results  are 
now  compared  for  N  less  than  200  and  the  percentage  of  array 
memory  saved  by  Singleton's  FFT  over  the  IMSL  FFT  was  plotted 
in  Figure  4.2  using  the  equation: 
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TABLE  4.4 

TIMING  RESULTS  FOR  IMSL  AND  SINGLETON  Fi'Ts 


N 

IMSL 

Time  (sec) 

Singleton 
Time  (sec) 

60 

.010 

.008 

120 

.018 

.014 

125 

.019 

.012 

128 

.013 

.011 

210 

.039 

.036 

243 

.031 

.031 

256 

.028 

.021 

315 

.054 

.052 

420 

.081 

.072 

504 

.090 

.082 

625 

.128 

.076 

729 

.107 

.107 

840 

.163 

.150 

1008 

.151 

.157 

1024 

.126 

.092 

1250 

.  275 

.  158 

1260 

.  268 

.231 

2048 

.269 

.224 

2187 

.366 

.364 

2520 

.  365 

.495 

V  » 
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Ficrure  4.2.  Memory  Array  Saved  Using  Singleton' 


savin*}:;  -  (Ml'MCC  -  Mr.MFNO)  •  1 00/*"  'Mcq;  (4.4 

wh*  ro  ‘v.yrr  r."'.''L  r-  n  ror; 

MEMSNG  =  Sinyl*.  ton '  s  array  r.r  mory 
From  the  plot  it  is  evident  that  Singleton's  algorithm.  uses 
less  nemo  ry  than  the  IMSh  pro*:  nr  .  The  "flit"  port :  of 

the  curve  approaches  57°  which  can  be  verified  bv  examina¬ 
tion  of  Cqs  i  3 . 172}  through  (3.178)  for  N  a  prime  num.bor. 
This  number  represents  the  memory  savings  at  the  points 
where  N  is  prime. 

The  values  of  M,  K,  KB,  and  JK  used  to  compute  the 
IWORD  constant  in  Eq  (3.173)  are  M=l,  K=N,  KB=N -2  and  JK=1. 

IWORD  =  3  •  M  +  3  +  MAX  (4  •  M  +  7  +  6  •  K, 


KB  + 

1  +  2 

•  JK) 

(4.5) 

IWORD  =  3  +  3  +  MAX 

(6N  + 

11,  N 

+  1) 

(4.6) 

IWORD  =  6  •  N  +  17 

(4.7) 

Now  the  memory  for  IMSL  given  that  N  is  prime  becomes: 

MEMCC  =  2  •  N  +  2(6  •  N  +  17)  (4.8) 

MEMCC  =  14  •  N  +  34  (4.9) 

The  array  memory  required  by  Singleton's  FFT  is  based 
on  the  values  NP  and  KP.  NT  is  dimensioned  to  one  less  than 
the  product  of  the  square  free  factors  of  N  or  if  at  most  one- 
square  free  factors  is  present,  MP  can  be  dimensioned  to  M+l 
where  M  is  the  number  of  prime  factors  in  N.  KD  is  the  size 
of  arrays  AT,  BT,  CK,  and  SK  where  KD  equals  the  largest 
prime  factor  in  N.  Using  these  results  the  expression  for 
array  memory  where  N  is  prime  becomes: 

MEMSNG  =  2  •  N  +  4  •  KD  +  NP  (4.10) 
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Substituting  tor  NP  and  KD  this  oqajijon  is. 

MCMSNG  =  2  •  :J  t  4  •  CJ  4  2  (4.11) 

MEMSMG  =  6  •  N  +  2  (4.12) 

Substituting  Eqs  (4.9)  and  (4.12)  into  the  percentage 
expression  in  Eq  (4.4)  is  sec-n  to  approach  approximately 
57°;: 

%  savings  =  ((14  •  N  +  34)  -  (6  •  N  +  2)) 

•  100/(14  *  N  +  34)  (4.13) 

%  savings  =  (8  •  N  +  36)  •  100/(14N  +  34)  (4.14) 

As  N  gets  large  Eq  (4.14)  becomes: 

%  savings  =  800N/14N  =  57%  (4.15) 

which  corresponds  to  the  results  shown  by  Figure  4.1. 

The  memory  array  must  be  added  to  the  program  memory 
to  determine  the  size  of  the  program.  The  program  memory 
required  by  each  algorithm  was  determined  by  compiling  each 
algorithm  for  the  CDC  Cyber  74.  The  IMSL  FFT  used  1061 
words  and  the  Singleton  FFT  used  1100  words.  The  larger 
size  of  the  Singleton  FFT  relative  to  the  IMSL  version 
is  because  of  the  extra  FORTRAN  code  needed  to  perform 
multi-variate  FFTs .  These  program  memory  figures  are  onlv 
applicable  for  the  FORTRAN  compiler  used  here  at  Ai’IT, 
however,  they  do  provide  a  relative  measure  of  the  program 
memory  size.  Singleton's  program  requires  about  3.75  more 
program  memory. 

The  results  for  real  operations  count  and  memory 
required  show  that  Singleton's  mixed  radix  FFT  is  superior 
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to  the  TMfu,  .-i  l ,  jo r  i  t  hr .  For  t  h  i r.  ri'.iron  S i  n> :  1  <  •  (  <  > ri  ’  ;• 

•  •  r  i  f  ;  i  ..  <.  ■  i  ■  ..  1  :  l  •  :it  i  o  •  •  .  '  :  -iihrout  .i  n> 

available  for  compnri  to  the  WPTA  and  PFA  in  the  follow¬ 
ing  sections. 

4  .  Conventional  vs  l-'.ist  Convo  i  ut  :■  ::  Mixni  R  i  FF'i’s 

Singleton's  algorithm  ( MFFT )  is  referred  to  as  a 
"conventional"  FFT  because  it  uses  the  Cooley-Tukey  deci¬ 
mation  and  reordering  of  the  data  array.  The  WFTA  and 
PFA  use  Winograd's  small-N  fast  convolution  algorithms 
to  perform  the  DFT.  The  operation  and  memory  array  counts 
are  presented  in  Figures  4.3  and  4.4  and  Tables  4.5a  and  b. 
as  a  function  of  N  for  comparison  of  the  three  algorithms. 
These  tables  and  plots  illustrate  the  advantages  and  dis¬ 
advantages  of  each  algorithm  and  are  used  along  with  the 
fixed  radix  results  in  Table  4.2  to  select  the  most 
efficient  algorithm  for  a  particular  sequence  length  and 
machine  capability  (size  and  speed) . 

The  fables  and  plots  refer  to  the  algorithms  as  MFFT 
(Singleton),  WFTA  tWi.nour.id),  and  PFA  (Kolba -Parks)  .  The 
PFA  used  for  opera:  ion  counts  and  memory  cor;,  ir  i  sons  is 
the  one  described  by  Burrus  and  Eschenbacher  which  includes 
prime  power  factors  of  2 , 3 , 4 , 5 , 7 , 8 , 9  and  16.  The  FORTRAN 
coded  program  for  PFA  was  obtained  from  C.  S.  Burrus  of 
Rice  University  and  does  not  make  use  of  "shifts"  for 
multiplications  by  1/2.  Both  the  WFTA  and  MFFT  FORTRAN 
programs  were  obtained  from  the  IEEE  Press  "Programs  for 
Digital  Signal  Processing". 
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The  memory  eer.pu  r  i  ■  >n  wan  based  on  m<  •mo  ry  urra'-  only 
and  d  id  not.  i  nclult  pr<  jraiu  ran  •.  •  y.  'i bin  wan  done  because 
the  program  memory  changes  based  on  machine  wore  length. 

The  program  memory  required  for  the  Cyber  74  is  given  for 
each  algorithm  no  the  relative  sice  '-an  be  cop;',  .rod. 

4.5.1  Real  Operations  Count .  The  mixed  radix  MFFT 
written  by  Singleton  includes  special  sections  for  factors 
of  2,  3,  4,  and  5  as  well  as  a  general  section  for  odd 
prime  factors  which  permits  the  transformation  of  any 
positive  integer  N  length  sequence.  Because  of  the  special 
sections  the  operations  count  is  less  for  an  N  which  is 
highly  factorable  by  2,  3,  4,  or  5  instead  of  higher  prime 
powers.  Figure  4.3  and  4.4  demonstrate  the  efficiency  of 
Singleton's  MFFT  relative  to  the  radix-2  complex  transform 
multiplications  and  additions  count  of  2N  log2  N  and 
3N  log2  N  respectively  (Winograd,  1976) .  The  MFFT  oper¬ 
ations  count  shown  in  Figures  4.3a,b  and  4.4a,b  are  for  N 
factorable  by  2,  3,  4 ,  or  5  combinations  thereof.  The 
WFTA  and  PFA  counts  are  shown  for  all  59  sequence  lengths 
which  they  can  transform.  Recnl.'  :  om  Section  5.4  and  3.5 
that  WFTA  and  PFA  sequence  1  ;  nils  ...  e  limited  by  the  data 

reordering  algorithm  used  by  the  WFTA  and  PFA.  These 
figures  also  reflect  the  WFTA  "post-initialization"  oper¬ 
ations  count.  As  shown  in  Section  3.5  the  post-initiali¬ 
zation  count  is  significantly  less  than  the  number  of 
operations  required  for  the  initial  transform  of  length  N. 
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Figure  4.3a.  Real  Multiplication  Comparison  for  PFA,  WFTA,  and  MFFT 


Figure  4.3b.  Real  Multiplication  Comparison  for  PFA,  WF'I'A,  and  MF FT 


Figure  4.4a.  Real  Addition  Comparison  for  PFA,  WFTA 


Reo,  1  Addition  Comparison  for  PFA,  WFTA,  and  MFFT 
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data  presented  here  wvs  collected  by  timing  the  individual 
subroutines  (INISHL,  PERM  1,  WEAVE  1,  MULT,  WEAVE  2,  PERM  2) 
in  the  WFTA  for  different  sequence  lengths  and  then  dividing 
the  time  required  for  each  subroutine  by  the  total  time  for 
all  of  the  subroutines.  Comparing  the  MEET  and  PEA  against 
the  post-initialized  WFTA  is  assumed  to  be  valid  because 
most  applications  of  DFTs  involve  the  repeated  transform 
of  N  length  sequences. 

A  point  by  point  comparison  of  MFFT,  WFTA,  and  PFA 
real  operations  is  presented  in  Table  4.7.  The 
sequence  lengths  in  these  tables  represent  the  only  lengths 
permissible  for  both  PFA  and  WFTA,  whereas  the  mixed  radix 
MFFT  can  transform  any  sequence  length.  The  operations 
count  presented  in  Tables  4.2,  4.7  with  a  computer's 
multiply  and  add  speed  can  predict  the  most  efficient 
(fastest) DFT  technique  for  that  particular  computer. 

Using  the  multiplv  and  add  speeds  determined  for 
the  CDC  Cyber  74  (see  Appendix  J)  as  1.9  x  10  ^  seconds 
and  1.7  x  10  C'  seconds ,  respectively,  the  ai  cori  thms 
execution  speeds  were  predicted  from  the  operations  count 
in  Tables  3.9  and  4.7.  The  predicted  execution  speeds 
do  not  account  for  all  of  the  actual  execution  time 
measured  as  shown  in  Figure  4.5.  The  extra  time  which 
was  not  predicted  by  the  real  operations  count  comes 
from  array  indexing  and  data  reordering  needed  in  all  of 
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TABLE  4.6 


TIMING 

RESULTS 

FROM  THE 

WFTA  S 

UP,  ROUTINES 

N 

INISHL 

PERM  1 

WEAVE  1 

MULT 

WEAVE  2 

PERM  2 

315 

4  8.04, 

7.5% 

16 . 3% 

4.5". 

16.3% 

7 .4% 

360 

47.0% 

5.9% 

15.7% 

5.9% 

21.6% 

3.9% 

630 

43.9% 

5.6% 

18.7% 

5.5% 

21.5% 

4 . 7% 

720 

44.0% 

3.5% 

20.0% 

6.1% 

22.8% 

3.6% 

840 

34.5% 

5.5% 

23.6% 

6.4% 

23.6% 

6.4% 

1008 

48.0% 

1.7% 

19.2% 

6.2% 

21.5% 

3.4% 

1260 

38.2% 

5.3% 

18.1% 

6.4% 

27.7% 

4.3% 

Results  are  given  as  %  of  total  time 
to  execute  WFTA. 
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.AND  PREDICTED  TIMING  RESULTS  FOR  MFFT,  WFTA,  AND 


\  ' 


04 


CM 

< 


o 

c 


c 

o 


rp 

O 


T 

O 


o 

o 


03 

o 


u  o 
a*  n 


o  o 
<0 


0 

X 

CN 

m 

CN 

in 

*H 

<13 

e 

O 

O 

o 

o 

4-1 

- — 

o 

c 

o 

o 

3 

C 

• 

• 

• 

• 

O 

•H 

CN 

•W 

CD 

< 

X 

CO 

h 

CN 

CO 

o 

(N 

m 

ch 

"T 

i-H 

r- 

rH 

r- 

in 

0) 

c 

Cn 

o 

1 — 1 

1 — < 

CN 

CO 

CN 

CO 

cn 

VO 

vO 

vO 

<T> 

• — \ 

in 

0 

§ 

o 

O 

o 

o 

O 

o 

o 

O 

O 

O 

o 

O 

CN 

u 

-r4 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

o 

4-» 

4H  (0 

*r-f 

CO  > 
(U  a) 
£  T> 


^3* 

r— ) 

CO 

CN 

CN 

O 

r* 

vO 

vO 

r- 

r- 

r- 

n 

U 

H 

o 

r— H 

r— 1 

CN 

cn 

~n 

cn 

vO 

vO 

o 

m 

00 

oj 

< 

o 

o 

o 

O 

O 

o 

o 

o 

o 

o 

O 

rH 

< — i 

CN 

0) 

T3 

H 

• 

• 

• 

• 

« 

* 

* 

* 

• 

• 

• 

• 

40 

c 

0  03 
■H  4J 
T3  CO 

a> 

^  0) 
a  >4 


, — v 

, — .. 

„ — , 

03 

, — * 

CN 

CN 

CN 

CN 

cn 

CN 

vO 

10  CO 

-S 

O 

O 

O 

o 

O 

O 

O 

10 

o 

o 

o 

o 

o 

o 

c 

a 

• 

• 

• 

• 

• 

• 

• 

0  — 

CO 

' — ' 

■— " 

— ' 

— ' • 

- • 

«w< 

u  cx 

Cl) 

< 

CD  — 

jC 

e- 

CO 

CC 

rr' 

<N 

vO 

X 

r- 

N* 

o 

w 

n 

•X 

cn 

cn 

*0 

•»-! 

[X 

O 

r~ l 

cn 

cn 

vO 

in 

l-H 

o 

O 

O' 

i— t 

T2 

r: 

£1 

o 

c 

o 

c 

o 

o 

c 

1  l 

rH 

o.  c 

WFTAl  use-  :  ■  i -i  i  t  i  al izat ion  subroutine  and  WFTA2 


the  .1!  :or  i  ‘  ,  hr >•••.•< *yo  r ,  !  hr  •>>'••<!  i  et  •  ■. v. •  < ■  1  j  >  i  ■. ...  •  •  .  .• 

based  only  on  real  operations  are  sufficient  to  select  the 
most  efficient  algorithm  as  demonstrated  by  Table  4.7. 

The  timing  results  in  Tabic  ■l.’7  compare  one-to-one  with 
the  predicted  times  (given  the  standard  deviations  shown  in 
parentheses)  for  all  three  algorithms.  Several  observa¬ 
tions  can  be  made  from  Table  4.7.  First,  the  WFTA1  which 
represents  the  initial  transform  made  by  WFTA  may  be  slower 
than  MFFT  for  certain  sequence  lengths.  An  example  of  this 
is  N=315,  630,  and  720,  all  of  which  were  correctly  pre¬ 
dicted  to  be  slower  from  the  operations  counts  in  Tables 
3.9  and  4.6.  Second,  the  post-initialized  WFTA 2  and  the 
PFA  were  predicted  to  be,  and  are,  faster  than  MFFT  for  all 
sequence  lengths.  Third,  the  PFA  and  WFTA 2  (post-initiali¬ 
zation)  are  close  in  efficiency  for  all  sequence  lengths. 

4.5.2  Memory.  The  memory  array  for  MFFT,  WFTA,  and 
PFA  was  compiled  from  the  previous  chapter  and  presented  in 
Figure  4.6  and  Table  4.5a  and  b.  The  figure  clearly  demon¬ 
strates  how  much  less  memory  array  is  required  by  MFFT. 

These  results  are  due  to  the  efficient  data  reordering 
technique  of  MFFT  which  can  essentially  be  done  in  place 
with  very  little  additional  memory  relative  to  the  sequence 
length.  The  WFTA  and  PFA  base  their  data  reordering  on 
the  Chinese  Remainder  Theorem  and  require  an  additional  two 
length  N  arrays  for  PFA.  The  WFTA  uses  even  more  memory 
array  because  of  the  algorithm's  structure  which  "nest" 
multiplications  inside  all  the  additions.  This  requires 
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Figure  4.6.  Memory  Arrays  Required  by  MFFT,  WFTA,  and  PFA 
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store  the  multiplication  coefficients  and  provide  working 
array  storage  because  the  WFTA  is  not  computed  in-place. 

The  program  memory  was  not  included  in  the  tui  -a  1  at  ions 
for  comparison  because  program  memory  required  depends  on 
the  machine  word  size.  The  i  rogram,  memory  required  on 
the  Cyber  74  for  each  algorithm  is: 

PFA  program  memory  =  770  words 
WFT  program  memory  =  2348  words 
FFT  program  memory  =  1100  words 
These  results  were  achieved  from  the  standard  compiler 
command  FTN  for  the  FORTRAN  IV  language.  For  short  sequences 
these  program  memory  requirements  contribute  significantly 
to  the  choice  of  the  most  memory  efficient  algorithm. 

4.5.3  WFT A  vs  PFA  Operations  Count .  The  tradeoffs 

between  WFT A  and  PFA  for  real  multiplications  and  additions 
can  be  seen  in  Figures  4.3  and  4.4.  In  most  cases  the  WFTA 
requires  less  multiplications  but  more  additions  than  PFA. 

The  selection  of  the  most  efficient  algorithm  then  becomes 
dependent  on  machine  speed  of  real  addition  comp  a  r*.  b  to 
real  multiplication.  As  an  example  of  this  tradeoff  between 
additions  and  multiplications  consider  the  case  of  N=630. 

For  this  sequence  length  the  PFA  requires  4352  multiplica¬ 
tions  and  18534  additions  while  the  WFTA  requires  2376 
multiplications  and  22072  additions.  Assuming  the  machine 
add  speed  of  1.7  x  10  ®  seconds  and  a  multiply  speed  of 
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l’or  the  selected  add  ana  multiply  speed  PFA  was  faster. 
However,  consider  the  case  where  a  multiply  requires  three 
times  tk  .  ad  :  i  tier.  r  ".2  10  ^  seconds.  For  the 

same  N-  630  the  PFA.  speed  is  predicted  to  be  .  054  seconds 
and  the  WFTA  speed  is  .050  seconds.  With  the  increase  in 
multiply  time  from  1.9  to  5.1  microseconds  the  WFTA 
became  the  more  efficient  algorithm.  This  example  illus¬ 
trated  why  the  add  and  multiply  speed  must  be  known  to 
select  the  fastest  algorithm  for  a  particular  sequence 
length  N. 

The  effects  of  changing  the  multiply  to  add  ratio  from 
1  to  20  is  shown  in  Figure  4.7a,  b,  and  c  for  MFFT ,  WFTA, 
and  PFA.  For  the  sequences  N=315  and  1008  the  PFA  is  most 
efficient  at  the  low  multiply  to  add  ratios  but  as  the 
multiplies  are  "more  costly"  the  WFTA  soon  becomes  the 
most  efficient.  For  N=30  the  WFTA  is  the  most  efficient 
for  all  ratios. 

4.0  Fl  cxi  h  il  it  v  e  *  1  he  ' '  FT  A  ]  oo  ri  t  lins 

It  is  clear  from  the  plots  in  Figures  4.3,  4.4,  and 
data  in  Table  4.2  that  the  fixed  radix  FFT ,  PFA,  and  WFTA 
are  somewhat  limited  in  permissible  sequence  lengths, 
whereas  the  mixed  radix  FFT  provides  a  much  more  "dense" 
selection  even  for  sequence  lengths  factorable  by  only 

i 

2,  ,4,  or  5.  The  restriction  in  possible  values  for  N 
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Figure  4.7c.  Relative  Efficiencies  of  MFFT,  WFTA ,  and  PFA 
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4.7  An  Algorithm  to  Select  the  Most  Ff  f  icient  DFT  Technique . 

The  results  of  this  chapter  are  used  to  develop  a 
systematic  approach  to  selecting  the  most  efficient  DFT 
method  from  the  fixed  radix  FFTs,  mixed  radix  FFT  (MFFT) , 
WFTA,  and  PFA.  A  flowchart  is  presented  which  selects  the 
most  efficient  algorithm  based  on  real  operations,  computer 
memory,  machine  speed,  and  sequence  length.  The  algorithm 
requires  inputs  of  machine  speed  for  add  and  multiply, 
sequence  length,  zeropack  limits,  and  computer  memory.  This 
algorithm  also  assumes  that  the  same  length  sequence  will 
be  repeatedly  transformed  such  that  the  WFTA  in  initialized 
only  once. 

■3.7.1  Arguments .  The  algorithm  requires  inputs: 

N:  Sequence  length  to  be  transformed 

NP :  The  upper  limit  to  which  the  sequence  length 

can  be  filled  to  reach  an  efficient  transform 
length . 

A:  Machine  addition  speed 

M:  Machine  multiplication  speed 
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4.7.2  Usage.  The  algorithm  is  presented  as  a  flow- 
el  irt.  The  baric  logic  of  the  nlqorithm  is: 

(1)  Zeropack  (if  permitted)  to  the  nearest  WFTA  or  PFA 
sequence  length. 

(2)  Determine  the  memory  requirements  for  the  WFTA  and  PFA. 

(3)  If  WFTA  and  PFA  both  fit  in  computer  memory  available, 
select  between  the  two  by  using  real  operations  and 
computer  speed. 

(4)  If  only  PFA  or  WFTA  fit  in  computer  memory,  select  the 
one  that  fits. 

(5)  If  neither  PFA  nor  WFTA  will  fit  in  computer  memory, 
zeropack  to  nearest  N  an  integer  power  of  2,  3,  or  5. 
Choose  the  most  efficient  algorithm  from  the  fixed  radix 
FFT  and  MFFT  based  on  real  operations  counts  and 
machine  speed. 

(6)  If  fixed  radix  FFT  cannot  be  used,  zeropack  to  nearest 
N  factorable  by  2,  3,  or  5  and  use  the  mixed  radix  FFT. 

Using  the  flow  diagram  of  Figure  4.8a,  b,  and  c  along  with 
the  specified  tables  selects  the  most  efficient  algorithm. 

An  example  for  N=410  demonstrates  the  use  of  Figure  4.8 
and  the  tables  in  this  paper  to  select  the  most  efficient 
DFT.  Given  that  A=450  nanoseconds  (ns),  M=1000  ns,  10% 
zeropacking  permitted,  and  no  memory  limitations,  the  most 
efficient  algorithm  can  be  selected. 

(1)  MEM  is  very  large  and  is  not  a  limitation 

(2)  N=410 

(3)  NP=410  +  .10(410)  =  451 
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Figure  4.8a.  Flowchart  to  Select  Most  Efficient 
Algorithm. 
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(4)  NP=N?  No,  continue 

(5)  NP£5040?  Yes,  continue 

(6)  Zeropack  to  nearest  WFTA  PFA  length  given  in  Table  4.6 
which  is  NP=420. 

(7)  PFA  fit  in  computer?  Yes,  continue 

(8)  WFTA  fit  in  computer?  Yes,  continue 

(9)  Determine  fastest  algorithm  between  WFTA  and  PFA  from 
Table  4.6.  For  N=420, 

WFTA  PFA 

Mult  1296  2528 

Add  11352  10956 

Using  A=450  ns  and  M=1000  ns  the  predicted  speeds 
are:  WFTA  =  6.4  milliseconds 
PFA  =  7.5  milliseconds 

For  this  sequence  N=420  and  for  the  add  and  multiply  speeds 
given  the  WFTA  is  the  fastest  algorithm.  However,  if  this 
sequence  were  only  being  transformed  once  for  a  particular 
utilization  and  the  WFTA  could  not  be  repeatedly  used  without 
initialization  the  WFTA  counts  must  be  taken  from  Table  3.11 
where  4920  multiplications  and  16200  additions  are  used  to 
initialize  the  WFTA  and  perform  the  transform.  Now  the 
WFTA  is  predicted  to  use  56.5  milliseconds  to  transform 
N=420.  When  selecting  between  WFTA  and  PFA  the  particular 
utilization  must  be  considered. 

It  should  also  be  noted  that  the  predicted  times  from 
Table  4.6  are  based  only  on  real  operations  which  do  not 
account  for  all  of  the  execution  time  required  as  shown  by 
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the  timing  tests.  For  the  cases  tested  in  Table  4.7  on  the 
Cue  Cyber  74  the  real  operations  accounted  for  average  67% 
of  the  PFA,  65%  of  the  WFTA,  and  61%  of  the-  MFFT  actual 
execution  speed. 


V.  Conclusions 


This  paper,  for  the  first  time,  presented  a  capability 
to  select  the  most  efficient  DFT  based  on  real  operations. 
These  real  operations  were  tabulated  and  plotted  as  a 
function  of  N.  The  algorithms  studied  and  compared  for  real 
operations  and  memory  include: 

1.  Radix-2  FFT  from  Rabiner  and  Gold. 

2.  Radix-3  FFT  written  by  the  author. 

3.  Radix-3  FFT  in  R{u)  from  Dubois  and 
Venetsanopolous . 

4.  Radix-5  FFT  written  by  the  author. 

5.  Mixed  radix  FFT  for  factors  of  2,  3,  or  5 
written  by  the  author. 

6.  IMSL  mixed  radix  FFT  which  can  transform 
any  sequence  length  N. 

7.  Singleton's  mixed  radix  FFT  which  can 
transform  any  sequence  length  N. 

8.  Winograd  Fourier  transform  algorithm  (WFTA) 
written  by  McClellan  and  Nawab. 

9.  Prime  Factor  Algorithm  (PFA)  written  by 
Burrus  and  Eschenbacher . 

5.1  Results  and  Conclusions 

The  two  radix-3  FFTs  were  compared  for  real  operations 
and  memory  required  to  perform  the  DFT  of  N  length  sequences 
where  N=3m.  Selection  criteria  were  developed  and  tabulated 
based  on  machine  speed.  The  new  radix-3  FFT  in  the  R(u) 


field  uses  less  multiplications  but  more  real  additions 
than  the  conventional  Radix-3  FFT.  The  more  efficient  of 
the  two  algorithms  depends  on  the  relative  costs  of  multi¬ 
plications  and  additions.  The  Radix-3  in  R(u)  is  most 
efficient  v/hen  multiplications  are  costly. 

All  of  the  fixed  radix  algorithms  were  compared  to  the 
Singleton  mixed  radix  FFT  for  real  operations  and  memory. 

The  operations  counts  show  that  the  most  efficient  algorithm 
depends  on  multiplication  and  addition  speed  of  the  computer. 
Data  was  tabulated  for  selecting  the  best  algorithm  based  on 
this  criteria.  The  FFT  algorithm  using  the  least  memory  can 
also  be  selected  from  Tables  4.2  and  4.3.  The  limited  choice 
of  sequence  lengths  possible  with  the  fixed  radix  FFTs 
reduce  their  utility  compared  to  Singleton's  mixed  radix  FFT. 

Three  conventional  mixed  radix  FFT  algorithms  were  com¬ 
pared  for  efficiency,  memory  array,  and  flexibility.  The 
author's  mixed  radix  FFT  was  very  efficient  but  required 
more  memory  array  and  was  not  as  flexible  since  N  was  limited 
to  factors  of  2,  3,  4,  and  5.  It  was  shown  that  Singleton's 
mixed  radix  FFT  was  more  efficient,  flexible,  and  used  less 
memory  array  than  the  IMSL  mixed  radix  FFT  and  was  chosen 
as  the  best  conventional  mixed  radix  FFT. 

Singleton's  mixed  radix  FFT  (labeled  MFFT)  and  the  fixed 
radix  FFTs  were  compared  to  the  WFTA  and  PFA.  The  real 
operations  and  memory  required  was  tabulated  and  plotted  for 
all  of  the  N  length  sequences  permitted  by  WFTA  and  PFA. 
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This  comparison  showed  that  the  WFTA  and  PFA  required  less 
real  operations  but  that  the  FFTs  requires  less  memory.  The 
MFFT  was  much  more  flexible  than  WFTA  or  PFA  since  N  can  be 
any  length  sequence. 

The  WFTA  and  PFA  were  then  more  closely  studied  and 
the  tradeoffs  between  the  two  were  discussed.  The  PFA  uses 
less  additions  but  more  multiplications  for  most  N  length 
sequences  which  means  WFTA  is  more  efficient  when  multipli¬ 
cations  are  "costly"  relative  to  additions.  The  PFA  uses 
less  memory  than  the  WFTA  which  makes  PFA  preferable  when 
the  machine  is  memory  limited.  Further  criteria  considered 
in  selecting  between  these  two  algorithms  are  the  (1) 
machine  language  and  (2)  the  particular  application  of  the 
algorithms.  If  the  machine  language  permits  "shifts"  to  be 
used  for  multiplication  by  1/2  the  PFA  performance  can  be 
improved.  (The  percentage  improvements  have  been  tabulated 
for  all  permissible  PFA  sequence  lengths) .  The  second  con¬ 
sideration  affects  the  WFTA  since  any  repeated  use  of  WFTA 
for  the  same  length  N  sequence  does  not  require  the  algorithm 
to  re-initialize  the  multiplier  coefficients.  Improvements 
in  operating  speeds  of  40%  over  the  initial  WFTA  were  realized 
on  the  Cyber  74  for  various  sequence  lengths. 

An  algorithm  to  select  the  most  efficient  DFT  method 
from  WFTA  (Winograd),  MFFT  (Singleton),  fixed  radix  FFTs, 
and  PFA  (Kolba  and  Parks)  was  presented.  This  selection  is 
based  on:  minimizing  real  operations  and  minimizing  memory 
size  for  the  machine  used.  Minimizing  real  operations  is 
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the  best  "first  order"  criteria  (Singleton,  1969)  and  was 
verified  by  timing  the  transforms  on  the  CDC  Cyber  74.  A 
summary  of  the  above  conclusions  is  presented  in  Table  5.1. 

The  PFA  was  chosen  as  the  best  DFT  technique  because 
it  minimizes  real  operations  well  below  the  FFT  levels, 
requires  substantially  less  memory  than  WFTA,  and  is  more 
flexible  than  the  fixed  radix  FFTs.  Of  course,  the  "optimum" 
algorithm  depends  on  the  specific  application  and  computer, 
but  for  general  applications  the  PFA  provides  the  best  mix 
of  minimizing  real  operations  and  memory. 

5 . 2  Recommendations 

The  above  conclusions  related  to  an  algorithm's 
efficiency  were  based  on  real  operations  and  then  verified 
by  timing  tests  on  the  CDC  Cyber  74.  The  Cyber  74  is  a 
representative  large  main  frame  computer  with  very  high 
speed  additions  and  multiplications. 

To  further  substantiate  the  conclusions  of  this  paper 
it  is  recommended  that  similar  timing  tests  be  made  on  other 
computers  (large  and  small)  available  at  AFIT  and  the  results 
compared  to  the  predicted  efficiencies  based  on  real  additions 
and  multiplications.  All  of  the  data  necessary  to  perform 
these  tests  is  available  in  this  paper. 
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Appendix  A.  Radix-2  FFT  Algorithm 

This  appendix  presents  an  algorithm  for  computing  the 
complex  fast  Fourier  transform  (FFT)  defined  by: 

N-l 

X(k)  =  T.  x(n)  exp(- j2iTnk/N) 
n=0 

where  k  =  0,1,  .  ..,  N-l 
and  n=2M,  M  integer. 

A  FORTRAN  subroutine  is  listed  for  computing  the 
radix-2  FFT  of  a  single-variate  forward  complex  Fourier 
transform  or  calculates  one  variate  of  a  multi-variate 
transform. 

Arguments . 

A  =  The  complex  array  to  be  transformed  which  is 
dimensioned  to  length  N. 

N  =  The  integer  sequence  length  to  be  transformed 

.  M 

which  must  have  length  equal  2  . 

M  =  The  integer  power  of  2. 

Usage.  For  a  single  variate  forward  transform: 

(1)  .  Specify  the  input  complex  sequence  A  along  with 

parameters  M  and  N. 

(2)  Dimension  complex  array  A  to  length  N. 

(3)  Call  FFT2C  ( A,M,N) . 

(4)  A  contains  the  complex  output  vector  X(k). 
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Appendix  B.  Radix- 3  FFT  Algorithm 

This  section  presents  an  algorithm  for  computing  the 
fast  Fourier  transform  (FFT)  based  on  a  method  called 
decimation-in-time  described  in  Chapter  III.  This  algo¬ 
rithm  is  an  efficient  method  for  computing  the 
transformation : 

N-l 

X(k)  =  Z  x(n)  exp (- j2Trnk/N)  ,  k  =  0,1,2,...,  (N-l) 
n=0 

where  X(k)  and  x(n)  are  complex  valued.  This  algorithm 
requires  that  the  sequence  length  be  N=3m,  m=p,l,2,...  . 

This  appendix  lists  a  FORTRAN  subroutine  for  computing 
the  radix-3  FFT.  This  subroutine  computes  the  single¬ 
variate  complex  Fourier  transform  or  calculates  one 
variate  of  a  multivariate  transform. 


Arguments. 

A  =  The  real  part  of  the  array  to  be  transformed 
which  is  dimensioned  to  length  N. 

B  =  The  imaginary  part  of  the  array  to  be  transformed 
which  is  dimensioned  to  length  N. 


M  =  The  exponent  of  3. 

M 

N  =  The  length  of  the  data  sequence  (N=3  ) . 

IW  =  A  work  vector  of  length  M. 

WKS  and  WKC  =  Storage  arrays  of  length  N  used  for 
sine  and  cosine  lookup  tables. 
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Usage.  For  a  single  variate  forward  transform: 


(1)  Specify  the  input  sequences  A  and  B  along  with  the 
parameters  M  and  N. 

(2)  Dimension  A,B,IW,WKS  and  WKC  to  the  correct  lengths 

(3)  Call  FFT3TM  ( A , B , M , N , IW , WKC , WKS ) . 

(4)  A  and  B  are  the  output  real  and  imaginary  portion  o 
the  complex  vector  X(k)  . 
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Appendix  C.  Radix-3  FFT  in  R ( u) 

This  appendix  presents  an  algorithm  for  computing 
the  radix-3  FFT  based  on  a  method  which  transforms  the 
array  from  the  complex  domain  (l,i)  to  the  R(u)  domain 

(l,u)  . 


Arguments 

A  =  Real  portion  of  the  complex  data  sequence  to  be 
transformed.  It  is  dimensioned  to  length  N. 

B  =  Imaginary  portion  of  the  complex  data  sequence 
to  be  transformed.  It  is  dimensioned  to  length  N. 

M  =  The  exponent  of  3 . 

N  =  The  length  of  the  data  sequence  (N=3M) . 

IW  =  Work  vector  dimensioned  to  length  M. 

WKC  and  WKS  =  Storate  array  dimensioned  to  length  N 
and  used  for  sine  and  cosine  look  up  tables. 

RTEST  =  Set  equal  to  zero  or  one.  If  the  data  sequence 
is  real,  RTEST=1 ;  if  the  data  sequence  is  complex,  RTEST=0 . 

Usage.  This  algorithm  is  an  efficient  method  for 
computing  the  transformation: 

N-l 

X(k)  =  I  x(n)  exp(- j2irnk/N)  k  =  0,1,  ... 
n=0 

where  X(k)  and  x(n)  are  complex  valued.  This  algorithm 

M 

restricts  N  to  equal  3  where  M  =  0,1,2,  ...  . 
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For  a  single  variate  forward  transform: 

(1)  Specify  the  input  sequences  A  and  B  along  with 
parameters  M,  N,  and  RTEST. 

(2)  Dimension  A,B,WKC,WKS,  and  IW. 

(3)  Call  FFT3RU  ( A, B, M, N , IW, WKC , WKS , RTEST) . 

(4)  A  and  B  are  the  output  real  and  imaginary  portion  of 
the  complex  vector  X(k) . 


?  ' 
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Append  Lx  D.  Rnd  i  5  FFT  /vigor  ithm 

This  section  presents  an  algorithm  for  computing  the 
FFT  based  on  decimation- in-time  of  the  discrete  Fourier 
transform  defined  by: 

N-l 

X(k)  =  £  x(n)  exp(-j2rrnk/N)  k  =  0,1,2,  ...  N-l 

n=0 

where  X(k)  and  x(n)  are  complex  valued.  This  algorithm 
restricts  the  length  of  the  sequence  to  be  N=5m  where  m 

is  an  integer. 

In  this  appendix  a  FORTRAN  subroutine  FFT5TF  is  listed 
for  computing  the  radix-5  FFT.  This  subroutine  computes 
the  single-variate  complex  Fourier  transform  or  performs 
the  calculation  for  one  variate  of  a  multivariate  transform. 

Arguments 

A  =  Real  portion  of  the  complex  data  sequence  to  be 
transformed.  It  is  dimensioned  to  length  N. 

B  =  Imaginary  portion  of  the  complex  data  sequence. 

It  is  dimensioned  to  length.  N. 

M 

M  =  Exponent  of  5,  where  N=5‘  . 

M 

N  =  Length  of  the  data  sequence  (N=5  ) . 

IW  =  Work  vector  of  length  M. 

WKC  and  WKS  =  Storage  arrays  dimensioned  to  length  N 


and  used  for  sine  and  cosine  look  up  tables. 


Usage .  For  a  single  variate  Forward  transform: 

(1)  Specify  the  input  sequences  A  and  B  along  with  the 


parameters  M  and  N. 

(2)  Dimension  A,B,IW,WKS  and  WKC  to  correct  lengths. 

(3)  A  and  B  are  the  output  real  and  imaginary  portion  of 
the  complex  vector  X(k). 
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Rad ix-5  FFT  Thoorv 

This  section  presents  the  theory  of  the  radix-5  FFT 
starting  with  the  DPT  definition  and  then  decomposing  the 
DFT  equation  using  the  decimation-in-time  algorithm 
(Cooley  and  Tukey,  1965) .  This  development  closely 
parallels  the  radix-3  development  presented  earlier  and 
consequently  the  radix-5  theory  will  be  brief. 

The  DFT  X(k)  is  computed  by  separating  the  discrete 
time  sequence  X(n)  into  five  N/5  point  sequences  (n  must 
be  of  length  5m,  m  =  0,1,2,  ...).  X(k)  is  given  by  the 
DFT  expression: 

N-l  nk  where  k  =  0,1,  ...,  N-l 

X (k)  =  E  x ( n ) W  (D.l) 

n=0  and  =  exp(-j2ir/N) 

Breaking  X(n)  into  five  N/5  point  sequences  yields  X(5r), 
X(5r+x),  X(5r+2),  X(5r+3),  and  X(5r+4).  Using  these 
sequences  and  Eq  (D.l)  gives: 

N/5-1  5rk  N/5- 1  ( 5r+l ) k  N/5-1  (5r-Z 

X(k)  =  x(5r)W  +  2  x(5r+l)w  2  x(5r+2)Wv 

r=0  :  r~0  ‘  r=0 

N/5-1  ( 5r+3) k  N/5-1  (5r+4)k 

+  E  x ( 5r+3) W  +  Z  x ( 5r+4 ) w  (D.2) 

r=0  N  r=0 


By  regrouping  exponents  and  making  the  substitution  of: 


W 


5r 

N 


=  W 


N/5 


(D.3) 


then  Eq  (D.2)  can  be  written  in  final  form  as: 
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n/:—  1 


l- 


x(^r)w  +  W  wCrflJW 

r=0  r-0 


2k  N/5-1 


+  WN  ■"  x(Sr+2)W  +  WNJ 

r=0  r=0 


3k  N/5-1  rk 

x(5r+3)W„/f 


4k  N/5-1 


x  (  5r  +  4  )  W. 


( D .  4 ) 


Each  of  the  N/5  point  DFTs  in  Eq  (D.4)  represents  an  N/5 
length  sequence  and  the  terms  in  front  of  the  summations 

are  the  butterfly  multipliers. 

Eq  (D. 4 )  can  be  rewritten  to  reflect  the  N/5  point 
DFTs  as: 

k  2k  3k  4k 

X(k)  =  A (m)  +  W„B(m)  +  WN  C(m)  +  WN  D(m)  +  WN  E(m)  (D.5) 

2 

For  N=5  =25  the  Eq  (D.5)  representation  is  shown  in  Figure 
D.l  and  uses  a  less  cumbersome  FFT, notation  (Rabiner  and 
Gold,  1975).  X (k)  is  obtained  by  evaluating  Eq  (D.5)  as: 

X(0)  =  A (0)  +  B ( 0)  +  C(0)  +  D (0)  +  E(0) 


X(l) 

-  A  ( 1 ) 

1 

+  W25 

B(l) 

2 

4.  W 

25 

C(l) 

3 

+  w25 

D  ( 1 ) 

4 

+  W_  _ 

2  D 

E(l) 

X  ( 2) 

=  AC) 

2 

+  W2  5 

B  (  2 ) 

4 

+  W25 

C  ( 2) 

6 

+  W25 

D  (  2 ) 

8 

+  W2  5 

E  (  2) 

6  12  18  24 

X ( 6 )  =  A(0)  +  W25  B  ( 0 )  +  W25  C(0)  +  W25  D ( 0 )  +  W25  E(0) 
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Figure  D.l 
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.,.i  g2 

2  3'  =  ,\<  3)  +  v:  ,-  1: '  •'  * 

' .  1 

:>  +  W.._ 

F.(  31 

24 

48 

72  96 

X ( 24 )  =  A (4)  +  13(4)  + 

W7C.  C  (  4 )  + 

W,-  D  (  4 )  +  W25 

E(4) 

The  above  expressions  explicitly  describe  the  first 
stage  decimation  for  N=25.  The  next  step  is  to  evaluate 

A(m)  -  E(np  which  are  also  5-point  DFTs.  The  5-point  DFT 
for  A(m)  can  be  evaluated  as: 


N/5-1  rm 

A(m)  =  £  x(r)W  /(- 

r=0  N/ 

which  results  in  five  N/25  length  sequences: 
N/25-1  5im  m  N/25-1 


(D.6) 


A (m)  =  I  a  ( 5i)  W  /0I-  +  Wx 


£  a ( 5i+l) W 


~  — "N/25  '  "N/5  i^Q“wx’ "'’'N/25 

2m  N/25-1  5im  3m  N/25-1  5im 

+  Wn/5  i=0a(5l+2] WN/25  +  WN/5  i'Qa(5l+3)WN/25 


4m  N/25-1 

W  '  a<?i+4>w 

i=0 

;n  -  0,1,...,  4 


N/25 


<  D.  71 


It  can  be  seen  from  Figure  D.l  that  a(5i)  =  x(0), 
a ( 5 i+1)  =  x ( 5) ,  a ( 5 i+2 )  =  x{10),  a(5i+3)  =  x(15),  and 
a(5i+4)  =  x(20)  for  the  5-point  DFT  of  Mm).  The  final 
expression  for  the  A(m)  5-point  DFT  is  given  from  Eq  (D.7) 
where  N=25: 
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Figure  D. 


Easic  Radix-5  Butterfly  Using 
Twiddle  Factors. 
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0 

0 

0 

0 

A  (0) 

a  ( 0 ) 

w 

1  (L) 

f 

.) 

i(2) 

i- 

wr. 

.)(■)  <  a  1  4  ) 

j 

(D.8) 

1 

•> 

J 

4 

A  ( 1 ) 

=  a(0) 

+ 

W5 

a  ( 1 ) 

4- 

W5 

a  ( 2 ) 

+ 

w5 

a  (  3 )  +  W,_  a  (  4  ) 

(D.9) 

2 

4 

6 

8 

A(  2) 

=  a  (  0 ) 

+ 

W. 

D 

a  ( 1) 

+ 

W5 

a  ( 2 ) 

+ 

ws 

a  (  3 )  +  W  ^  a  ( 4  ) 

(D.10) 

o 

b 

9 

12 

A(  3) 

=  a  ( 0) 

+ 

W5 

a  (1) 

+ 

w5 

a  ( 2) 

+ 

W5 

a  (  3)  +  W5  a  ( 4 ) 

(D.ll) 

4 

8 

12 

16 

A  ( 4 ) 

=  a  (  0 ) 

+ 

w5 

a  ( 1) 

+ 

W5 

a  ( 2) 

+ 

W5 

a ( 3)  +  W5  a ( 4 ) 

(D.12) 

From  Eqs  (D.8)  -  (D.12)  the  basic  butterfly  multipliers  are 
derived  to  be: 

k  2k  3k  4k 

X(k)  =  A(k)  +  WjjBtk)  +  WN  C  (k)  +  WN  D  (k)  +  V*N  E(k)  (D.13) 

k+r  2k+2r  3k+3r 

X(k+r)  =  A(k)  +  WN  B (k)  +  WN  C(k)  +  WN  D(k) 

4k+4r 

+  WN  E  (k)  (D.  14) 

k+2r  2k+4r  3k+6r 

X (k+2r)  =  A(k)  +  WM  B(k)  +  WM  C(k)  +  W „  D{k) 

N  N  N 

4k+8r 

+  WN  E (k)  (D. 15) 

k+3r  2k+6r  3k+9r 

X  (k+3r)  =  A{k)  +  w.T  B(k)  +  WM  C  (k)  +  W__  D(k) 

N  N  N 

4k+12r 

+  W  E (k)  (D. 16) 

k+4r  2k+8r  3k+12r 

X(k+4r)  =  A{k)  +  WM  B(k)  +  W„  C(k)  +  W.,  D(k) 

N  N  N 

4k+16r 

+  WN  E(k)  (D.  17) 
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The  Fqs  (0.13)  -  (0.17)  arc  shown  in  U.e  tw'iddio  factor 
butterfly  of  Figure  D.2  where  "r"  is  the  distance  between 
the  butterfly  and  points.  Since  N=5r  the  butterfly  multi¬ 
pliers  reduce  to  constant  complex  multipliers  of: 
r  6r  16r 

WN  =  VJN  =  WN  =  cos  (  2 tt / 5 )  -j  sin  (  2tt/5) 

2r  12r 

Wn  =  =  cos(4tt/5)  -j  sin(4iT/5) 

3r  2r  *  8r 

W„  =  ( W. .  )  =  W  =  cos(4ti/5)  +  j  sin  (  4tt/5) 

N  N  N 

4r  r  #  9r 

WN  =  (WN)  =  WN  =  cos  ( 2tt/5)  +j  sin(2n/5) 

These  constant  butterfly  multipliers  are  computed  once 
during  the  FFT  computation  and  used  in  every  radix-5 
butterfly. 
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Appendix  P . 


Mixed  Rod  i_x  TFT  Algorithm 

This  section  presents  an  algorithm  .or  computing  the 
FFT  based  on  the  discrete  Fourier  transform: 

N-l 

X(k)  =  l  x(n)  exp  (- j2:.nk/N) 
n=0 

The  algorithm  described  here  can  accept  an  N  length  sequence 
which  is  factorable  by  2,  3,  4,  or  5.  To  aid  in  selecting 
an  appropriate  length  sequence  for  this  algorithm  a  list  of 
numbers  less  than  50,000  containing  no  prime  factors  larger 
than  five  is  listed  in  Table  E. 

Arguments 

A  =  The  real  portion  of  the  complex  data  sequence  to 
be  transformed.  It  is  dimensioned  to  length  N." 

B  =  Imaginary  portion  of  the  complex  data  sequence  to 
be  transformed.  It  is  dimensioned  to  length  N. 

M  =  Number  of  factors  of  N. 

WKC  and  WKS  =  Storage  arrays  dimensioned  to  length  N 
and  used  for  sine  and  cosine  look  up  tables. 

N  =  Length  of  the  sequence  to  be  transformed.  N 

must  be  an  integer  power  of  2,  3,  4,  5,  or  a  combination 
thereof . 

AT  and  BT  =  Arrays  used  in  the  subroutine  for  tem¬ 
porary  storage  of  A  and  B  during  the  data  reordering  (digit 

reversal) . 

NFAC  =  Contains  all  the  factors  of  N.  NFAC  is  computed 
by  the  user  and  passed  to  the  subroutine  in  the  argument  list . 
Dimensioned  to  length  M. 
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IWK  Coni  :  i  :t:'  tin-  uov,\  ■  .  u »'  1,  j,  4,  .  1  n  <  1  3  .  md  is 
d imons  i  oru*d  to  length  4. 

IWK(l)  =  powers  of  5 

IWK(2)  -  powers  of  4 

IWK ( 3 )  =  powers  of  3 

TWK  (4)  -  powers  of  2  (must  bo  n  or  1) 

Usage .  The  subroutine  listed  permits  a  maximum  of  11 
factors  which  is  adequate  for  any  N  less  than  2^  with  the 
factoring  used  by  this  subroutine. 

(1)  Dimension  arrays  A,B , AT, BT, WKC,  and  WKS  to  length 
N  and  array  NFAC  to  length  M. 

(2)  Factor  N  and  store  them  in  array  NFAC.  Array  NFAC 
must  contain  the  factors  of  N  starting  with  the  high¬ 
est  prime  factor,  5,  and  continuing  to  the  lowest,  2. 

E.G.  N=480 

NFAC(l)  =  5,  NFAC (2)  =  4,  NFAC(3)  =  4 

NFAC (4)  =  3,  NFAC (5)  =  2. 

(3)  Specify  the  integer  powers  of  2,  3,  4,  and  5  in  the 
array  IWK. 

E.G.  N=480 

IWK(l)  =  1,  IWK (2)  =  2,  IWK (3)  =  1,  TWK(4)  =  1 
In  general , 

N  =  2  3  4  ■  5  and 

IWK ( 1 )  =  q,  IWK (2)  =  p,  IWK (3)  =  n,  IWK (4)  =  m. 

(4)  Specify  values  for  A  the  real  part  of  data  sequence 
and  B  the  imaginary  part  of  the  data  sequence. 

(5)  Call  FFTMR ( A , B , M, N , WKC ,NKS , AT, BE , NFAC , IWK) . 

(6)  A  and  B  contain  the  real  and  imaginary  part  of  the 
transform  X(b) . 
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i  il  ■  r  ltioii.i  Count  tor  FFTMR 

The  oprrati jns  count  for  the  factorization  used  in 
f'his  algorithm  is  a  function  of  (1)  the  number  of  butter¬ 
flies,  (2)  the  number  . :  complex  twiddle  factors,  and 
(3)  the  nuir-itr  of  v  ..ies  the  cosine  and  sine  difference 
equations  must  be  computed.  The  number  of  butterflies  in 
a  mixed  radix  algorithm  has  been  shown  to  be  (Singleton,  1969) 

m 

1  (N/p.)  (E.l) 

i=l  1 

and  the  number  of  complex  twiddle  factors  is: 
m 

Z  (N(p.-l)/p.)  -  (N-l)  (E. 2) 

i=l  1  1 

where  N=p^p2  ...  pm<  The  radices  in  this  algorithm  are 
restricted  to: 

N  =  2r  3S  4fc  5U  (E. 3) 

Given  the  factorization  in  Eq  (E.3)  the  radix-2  section 

(where  p=2)  has 

r  r 

Z  (N/p.)  =  Z  (N/2)  =  rN/2  (E.4) 

i=l  1  i=l 

butterflies  which  require  four  real  additions  each.  The 
number  of  complex  twiddle  factors  for  the  radix-2  is 
given  as : 

r  r 

Z  (N(p.-l)/p  )  =  Z  (N/2)  -  rN/2  (E.5) 

i=l  1  1  i=l 

which  requires  four  real  multiplications  and  two  real 
additions  each.  Notice  that  the  N-l  term  has  not  been 


subtracted  as  in  Eq  (E.2).  The  N-l  term  will  be  subtracted 
after  the  total  operations  count  has  been  derived  for  3,  4, 
and  5  factors  and  combined  with  factors  of  2.  Using  Eqs 
( E . 4 )  -  ( E . 5 )  and  the  number  of  additions  and  multiplications 
required  for  <  ich  provides  the  operations  count  for  the 
radix-2  section  as: 

real  mult  =  4(rN/2)  =  2rN  (E.6) 

real  adds  =  4(rN/2)  +  2(rN/2)  =  3rN  (E.7) 

The  radix-3  section  requires  4  real  multiplications 
and  12  real  additions  per  butterfly  and  4  real  multi¬ 
plications  and  two  additions  per  complex  twiddle  factor. 

Using  Eqs  (E.l)  and  (E.2)  the  number  of  butterflies  for 
p=3  is: 

s  s 

I  (N/p.)  =  I  (N/3)  =  sN/3  (E. 8) 

i=l  1  i=l 

and  the  number  of  twiddle  factor  (neglecting  the  N-l  term) 
is : 

s 

T.  (N(p.-l)/p.  )  =  2sN/3  (E.  9) 

i=l 

Combining  the  additions  and  multiplication,  required  for 
each  butterfly  and  twiddle  fac  or  wi  th  Eqs  ( E . 8 )  -  ( E . 9 ) 
gives  the  operations  count  for  the  radix-3  section  as: 

real  mult  =  4(sN/3)  +  4(2sN/3)  =  4sN  (E.10) 

real  adds  =  12(sN/3)  +  2(2sN/3)  =  16sN/3  (E.ll) 

The  radix-4  sectic.i  has  zero  real  multiplications 
and  16  real  addition  per  butterfly  with  4  real 
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multiplications  and  2  real  additions  per  twiddle  factor. 
The  number  of  butterflies,  where  p=4 ,  is  given  by: 


t  t 

E  (N/p.)  =  E  (N/4)  =  tN/4  (E.  12) 

i=l  1  i=l 

the  number  of  twiddle  factors  is: 
t  t 

E  (N(p.-l)/p. )  =  E  (3N/4)  =  3 tN/4  (E.13) 

i=l  1  1  i=l 

Using  the  number  of  multiplications  and  additions  per 
butterfly  and  twiddle  factor  in  Eqs  (E.12)  -  (E.13)  gives 
the  total  operations  for  factors  of  4  as: 

real  mult  =  4(3tN/4)  =  3tN  (E.14) 

real  adds  =  16 (tN/4)  +  2(3tN/4)  =  lltN/2  (E.15) 

The  radix-5  section  requires  16  real  multiplications 
and  32  additions  per  butterfly  with  4  real  multiplications 
and  2  additions  per  twiddle  factor.  Using  Eqs  (E.l)  and 
(E.2)  where  p=5  gives  the  total  butterflies  as: 
u  u 

E  (N/p.)  =  E  (N/5)  =  uN/5  (E.16) 

i=l  1  i=l 

and  the  number  of  twiddle  factors  as: 

u  u 

E  (N(p.-l)/p.)  =  E  (4N/5)  =  4uN/5  (E.17) 

i=l  1  1  i=l 

Combining  Eqs  (E.16)  -  (E.17)  with  the  operations  required 
for  butterflies  and  twiddle  factor  in  the  radix-5  section 
gives  the  total  as: 

( 
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real  mult  =  16(uN/5)  +  d(4uN/5)  ~  32uN/5 
real  adds  =  32(uN/5)  +  2(4uN/5)  =  8uN 


( E .  18 ) 


Using  the  results  of  Eqs  (E.4)  -  (E.18)  and  subtracting 
the  N-l  complex  twiddles  provides  the  number  of  real  oper¬ 
ations  used  for  butterflies  and  twiddle  factors  for  the  mixed 
radix  algorithm.  The  expressions  are: 


real  mult  =  2rN  +  4sN  +  3tN 
+  32uN/5  -  4 (N-l) 


(E. 19) 


real  adds  =  3rN  +  16sN/3  +  lltN/2 
+  8uN  -  2 (N-l) 


(E. 20) 


Recall  that  Eqs  (E.19)  -  (E.20)  account  for  only  two 
of  the  three  sources  of  real  operations  in  this  algorithm. 
The  third  source  is  computing  the  sine  and  cosine  look  up 
table.  From  the  FORTRAN  program  in  this  appendix  the 
expressions  computing  the  look  up  table  are: 

WKC(I)  =  C*WKC ( I — 1 )  -  S*WKS { 1-1)  +  WKC(I-l)  (E.21) 
WKS(I)  =  ( *WKS ( 1-1)  +  S*WKC ( 1-1)  +  WKS(I-l)  (E.22) 


Each  equation  requires  5  real  additions  and  2  real 
multiplications  and  they  are  computed  N-l  times  for  the 
mixed  radix  FFT .  The  real  operations  required  to  compute 
the  look  up  table  are: 

real  mult  =  4(N-1)  (E.23) 

real  adds  =  10 (N-l)  (E-24) 
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Combinin'.]  ilqs  (C.23)  -  (il.24)  with  the  real  operations 
for  butterflies  and  twiddle  factors  provides  the  total 
real  operations  for  the  mixed  radix  FFT: 

real  mult  =  2rN  +  4sN  +  3tN 

+  32uN/5  -  4 (N-l)  +  4 (N-l) 

=  2rN  +  4sN  +  3tN  +  32uN/5  (E.25) 

real  adds  =  3rN  +  16sN/3  +  lltN/2 


+  8uN  -  2 (N-l)  +  10 (N-l) 
=  3rN  +  16sN/3  +  lltN/2 
+  8uN  +  8 (N-l) 


(E. 26) 


I 


Development  of  the  Mixed 
Radix  Digit-Reversed  Algorithm 

Assuming  that  the  number  of  points  to  be  transformed 
satisfies  N=r^,  r 2,  .  ..,  rm,  where  r^,  r2,  •••/  rm  are 
integer  values,  the  indices  of  x(n)  and  X(k)  can  be 
expressed  as  (Brigham,  1974) : 

n  =  nm_i  (r2  r3  •“  rm)  +  nm-2  (r3  r4  *•*  rm} 

+  n,r  +  nft  (E.27) 

l  m  u 

k  “  km-l  (rl  r2  rm-l)  +  km-2  (rl  r2  * ' *  rm-2) 

+  +  k0  (E.28) 

where 

ki_1  =0,  1,  2,  ...  ri-l  i  f  i  fra 
n.  =  0,  1,  2,  ...  rm_.-l  0  <  i  <  m-i 

For  N=30  =  2x3x5  =  r-^r-j  and  m=3  the  input  sequence 
x(n)  counter  is: 

n  =  n2  (15)  +  n^.  (5)  +  n^  (E.29) 

where 

nQ  =  0 ,  1 ,  2 ,  3 ,  4 
n^  =  0,  1,  2 
n2  =  0,  1 

The  output  counter  k  for  X(k)  is: 

k  *  k2  (6)  +  kx  (2)  +  kQ 
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where 


-0,1,2 

k2  =  0,  1,  2,  3,  4 

To  implement  the  general  digit  reversed  counter  let  the 
input  counter  n  use  the  digit  reversed  multipliers  of  the 
output  counter  k : 

n  =  nm-l  +  nm-2  (rl}  +  *  * ' 

+  nx  (rx  r2  ...  rm_2) 

For  the  example  r2  =  2x3x5 
counter  becomes: 

n  =  n2  +  2n^  + 

where,  as  before: 

1,  2,  3,  4 
1,  2 
1 


(E. 30) 

+  n0  <ri  r2  . . . 

=  30  the  digit  reversed 

(E. 31) 
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Append  Lx  F.  S i nc; Leton  1  s  Mixed  Radix  F FT 

This  program  was  written  by  R.C.  Singleton  and  pub¬ 
lished  by  the  IEEE  press  in  "Programs  for  Digital  Signal 
Processing".  It  computes  the  DFT  defined  by: 

N-l 

X(k)  =  I  x(n)  exp  ( -  j  2imk/N) 
n=0 

It  also  computes  the  1/N  scaled  inverse  Fourier  transform. 

The  subroutine  listed  in  this  appendix  factors  N  into 
"square"  and  "square-free"  factors  and  stores  the  results 
in  an  array  NFAC .  It  then  calls  subroutine  FFTMX  to  com¬ 
pute  the  complex  Fourier  transform,  twiddle  the  data,  and 
reorder  the  complex  array  to  final  order. 

Use  of  this  subroutine  for  multi-variate  transforms  is 
described  in  the  comments  section  at  the  beginning  of  the 
program.  A  multi-variate  transform  is  basically  a  single¬ 
variate  transform  with  modified  indexing  (Singleton,  1977) . 

The  subroutine  listed  permits  the  sequence  length  that 
has  15  or  fewer  factors. 

The  smallest  number  that  has  more  than  15  factors  is 
12,754,584  and  if  this  condition  is  encountered  an  error 
message  is  printed. 

The  transform  portion  of  the  subroutine  includes 
sections  for  factors  of  2,  3,  4,  or  5  as  well  as  a  general 
section  for  odd  prime  factors.  The  special  sections  for 
2  and  4  include  the  twiddle  factor  multiplication  in  these 
special  sections  instead  of  using  the  general  twiddle  factor 
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section.  "Performing  th'  transform  in  this  manner  pro¬ 
duces  a  10  percent  speed  improvement  over  the  general 
twiddle  section"  (Singleton,  1969).  The  special  sections 
for  3  and  5  are  similar  to  the  general  odd  factor  section 
but  reduce  the  indexing  required  and  thus  improve  the 
speed  (Singleton,  1969) . 

Arguments .  The  Singleton  FFT  for  computing  a  complex 
single-variate  transform  is  called  using  the  following 

arguments : 

A  =  The  real  part  of  the  array  to  be  transformed  and 
is  dimensioned  to  length  N. 

B  =  The  imaginary  part  of  the  array  to  be  transformed 
and  is  dimensioned  to  length  N. 

N  =  Length  of  the  input  sequence  N  which  must  be  a 
positive  integer  with  no  more  than  15  factors. 

NSPN  =  The  spacing  of  consecutive  data  values  while 
indexing  the  current  variable  (in  units  determined  by  the 
magnitude  of  ISN) . 

ISN  =  The  sign  of  ISN  determines  the  transform  direc¬ 
tion  (negative  for  forward  and  positive  for  inverse) .  The 
magnitude  of  ISN  determines  the  indexing  increment  for 
arrays  A  and  B.  Normally  the  magnitude  of  ISN  is  unity. 

NSEG  =  An  integer  value  such  that  NSEG  x  N  x  NSPN 
equals  the  total  number  of  complex  data  values. 
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Usage .  For  a  s  ingle— var  Late  forward  transform: 

(1)  Specify  the  input  sequences  A  and  B  and  parameters 

NSEG= 1 ,  N=transform  length,  NSPN-1,  and  ISN=  -1. 

(2)  Dimension  A  and  B  to  length  N. 

(3)  Call  FFTSNG  ( A , B , NSEG , N , NSPN, ISN)  . 

(4)  A  and  B  are  the  output  real  and  imaginary  portion 
of  the  complex  vector  X(b). 

To  perform  a  real  valued,  inverse,  or  multi-variate 
transform  refer  to  the  comments  portion  of  FFTSNG. 
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Appuiul  i C.  IMSL  Mixed  Radi::  L'i’T 

The  International  Mathematical  Subroutine  Library 
contains  a  mixed  radix  subroutine  which  can  perform  the 
FFT  of  any  positive  inteqer  lenqth  sequence.  This  sub¬ 
routine  was  based  on  Singleton's  article  "On  Computing  the 
Fast  Fourier  Transform",  Comm.  ACM  10(10)  1967  in  which  he 
proposed  several  ideas  used  in  the  IMSL  subroutine.  As 
stated  in  Chapter  III  the  program  closely  resembles 
Singleton's  algorithm  published  in  the  open  literature 
but  the  IMSL  version  has  been  copyrighted  and  the  FORTRAN 
code  is  not  listed  in  this  paper.  The  IMSL  description  of 
the  algorithm  and  its  usage  are  included  in  this  appendix 
for  the  convenience  of  the  reader  and  a  detailed  develop¬ 
ment  of  the  real  operations  count  which  was  not  presented 
in  the  main  text  is  also  in  this  appendix. 
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Real  Operations  Count  For 
IMSL  Mixed  Radix  Algorithm 

A  copyrighted  mixed  radix  FFT  is  available  through 
the  International  Mathematical  Scientific  Library  (IMSL) 
on  the  CDC  computer  used  at  AFIT.  This  subroutine  ( FFTCC ) 
can  accept  any  length  sequence  N  including  prime  numbers. 

It  is  based  on  an  article  written  by  Singleton,  "On 
Computing  the  Fast  Fourier  Transform"  published  in  1967. 

Functionally  this  subroutine  has  few  differences  from 
Singleton's  algorithm  described  in  the  preceding  section. 
The  factoring,  twiddle  factors,  and  reordering  of  the  data 
is  the  same,  however,  the  special  sections  for  factors  of 
3  and  4  require  2  and  8  more  additions,  respectively,  than 
Singleton's  subroutine.  Also  this  mixed  radix  algorithm 
uses  the  general  factors  section  for  odd  prime  factors  of 
5  or  greater  which  further  reduces  the  efficiency  compared 
to  Singleton's. 

As  in  the  case  of  Singleton's  FFT  subroutine  the  real 
operations  count  for  the  IMSL  subroutine  is  determined  from 
the  number  of  twiddle  factors: 

m 

l  (N(p.-l)/p.)  -  (N-l)  (G.l) 

i=l  1  1 

and  the  number  of  butterflies: 

m 

Z  N/p.  (G. 2) 

i=l  1 
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whore  N~p,  p_  ...  ;>  .  In  thin  subroutine  the  lact.orinq  is 
performed  such  that  N  ~  2"  3'  4  p'^'L  ...  p^1"  with  the 

real  operations  count  being  derived  from  the  FORTRAN 
coded  subroutine  FFTCC  and  the  Eqs  (G.l)  and  (G.2) .  The 
radix-2  section  of  FFTCC  includes  the  twiddle  factor  multi¬ 
plications  with  the  butterfly  computation.  In  chis  case 
there  are  rN/2  butterflies  and  twiddle  factors  to  be  com¬ 
puted  using  4  real  multiplications  and  6  real  additions 
giving : 

#  real  mult  =  4 (rN/2)  =  2rN  (G.3) 

#  real  adds  =  6 (rN/2)  =  3rN  (G.41 

The  radix-3  section  uses  sN/3  butterflies  and  2sN/3  twiddle 
factors  which  require  4  real  multiplications  and  14  additions 
per  butterfly  and  4  real  multiplications  and  2  real  additions 
per  twiddle  factor.  Combining  the  butterflies  and  twiddle 
factors  the  real  operations  count  for  the  radix-3  section 
is  given  by: 

real  mult  =  4(2sN/3)  +  4(sN/3)  =  4sN  (G.5) 

real  adds  =  14(sN/3)  +  2(2s N/3)  -  6 sN  (G.6) 

The  radix- 4  section  uses  24  real  additions  and  no  real 

multiplications  for  the  tN/4  butterflies.  The  3tN/4  twiddle 
factors  require  2  real  additions  and  4  real  multiplications. 
Combining  the  results  gives: 

real  mult  =  3tN  (G.7) 

real  adds  =  24tN/4  +  2(3tN/4) 

=  15tN/2  (G. 8) 
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All  odd  prime  factors  canal  to  or  creator  than  5  use 
the  general  transform  section.  Based  on  the  FORTRAN 
program  written  by  IMSL  there  are  five  sources  of  real 
operations  in  this  general  radix-p^  transform  excluding  the 
array  indexing  additions.  First  '.he  complex  multipliers 
are  computed  for  the  butterfly  transmittance: 

real  mult  =  2(p^-l)  (G.9) 

real  adds  =  (p.-l)  (G.10) 

for  each  new  factor  p^ ,  e.g.,  N=7*4=28  and  N=7*7*4=196 

each  require  the  same  (p^-l)=(7-l)  complex  multiplications 

for  the  factor  p^=7.  Second  the  complex  twiddle  factor 

multiplications  are  performed  on  the  data  array.  Assuming 

N  can  be  factored  as: 

M  _  0r  -,s  .t  ml  m2  mk 

N  =  2  3  4  p1  p2  .  -  .  pk 

where  p™1  represents  the  ifc^  factor  raised  to  some  positive 
integer  mi,  the  number  of  complex  twiddles  is  (mi) N(p^-l) /p^ 
-(N-l).  The  n-1  term  is  subtracted  only  once  for  each  FFT , 
which  means  the  intermediate  result  can  be  written  as: 

real  mult  =  4 (mi) N(p^-l)/p^  (G.ll) 

real  adds  =  2 (mi) N (p^-1) /p^  (G.12) 

The  individual  butterflies  are  computed  next.  The  first 
output  of  each  butterfly  requires  only  3(p^-l)/2  real 
additions  and  no  multiplications.  For  each  radix-p™1  there 
are  (mi)N/p^  butterflies  in  the  FFT  giving: 
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real  adds 


=  ( 8  ( p t -  1  )  / 2  )  (\:  (mi)  /'p.  ) 

=  4N(mi) (p.-l)/p^  (G. 13) 

Now  the  remaining  portion  of  each  butterfly  is  computed 
2 

using  ( p  —  1 )  real  multiplications  and  additions.  This 
gives  a  total  of: 

2 

real  mult  =  N(p^-l)  (mi)/p^  (G.14) 

real  adds  =  N ( p^-1) 2 (mi) / p^  (G.15) 

Finally  the  results  of  the  butterfly  operations  are  stored 
in  the  proper  array  locations  requiring  4  real  additions 
times  (p^-l)/2  times  the  number  of  radix-p^  butterflies. 
This  total  is: 


real 

Combining 
operations  for 

real 


adds  =  (4(p^-l)/2) (N(mi)/p^) 

=  2(mi)N(pi-l)/pi 

Eqs  (G.9)  -  (G.16)  the  number  of  real 

the  p.  factor  becomes: 
k 

mult  =  E  (2(p.-l)  +  4 (mi) N(p. -1) /p. 

i=l  1  l'l 


(G.16) 


+ 


N(pi-1) 


2 


(mi) /pi ) 


( G . 17) 


k 

real  adds  =  E  ((p.-l)  +  2 (mi) N(p. -1) /p. 

i=l  1  1  x 

+  4 (mi) N (p^-1) /p^  +  N (p^-1) 2 (mi) /p^ 


+  2 (mi) N (p^-1) /p^ ) 

k 

=  E  ((p.-l)  +  8 (mi) N(p • -1) /p. 
i=l  1  1 

+  N(p^-l) 2 (mi) /p^)  (G. 18) 
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Usin-j  Eqs  ('l. 17)  and  (G.L-3)  .:r  Liu  mid  pr  imo  facto rs  and 

the  real  operations  count  for  factor:;  of  2,  3,  and  4  the 

total  operations  cound  for  N  =  2r  3s  4t  p™^  . . .  p™^  can 
be  written  as-. 

real  mult  =  2rN  +  4sN  +  3tN 
k 

+  £  ( 2 ( p . — 1 )  +  4 (mi) N (p . -1) /p. 

i=l  1  1  1 

+  N (mi)  (pi-l)2/pi)  -  4 ( N-l )  ( G . 19 ) 

real  adds  =  3rN  +  6sN  +  15tN/2 
k 

+  Z  ((p.-l)  +  8 (mi) N(p. -l)/p. 
i=l  1  ii 

+  N(MI) (Pi-1) 2/Pi)  -  2 (N-l)  (G. 20) 

As  in  any  FFT  the  real  operations  associated  with  the 
twiddle  factors  have  been  reduced  by  (N-l)  multiplications 
and  additions  because  the  last  stage  of  decimation-in¬ 
frequency  or  the  first  stage  of  a  decimation-in-time  FFT 
require  no  twiddles. 
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Append  i  >:  H. 


An  A I  «;or.i  fchm  for  C  mput  i  no  the  WFT»>. 


This  program  computes  the  DFT  defined  by: 

N-l 

X(k)  =  Y.  x(n)  exp(-j2nnk/N)  ;  k-0 ,  1,  N-l 

n=0 

where  the  sequence  length  N  is  a  product  of  the  relative 
prime  factors  from  the  set  (2,3,4,5,7,8,9,16). 

Program  Description.  The  WFTA  consists  of  the  six 
subroutines  PERM  1,  PERM  2,  MULT,  WEAVE  1,  WEAVE  2,  and 
INISHL.  Step  One  is  to  map  the  sequence  x(n)  into  a 
u-dimensional  array  s(n^,  n.2 ,  ...»  nu)  •  Step  Two  implements 
the  "pre-weave"  modules  in  subroutine  WEAVE  1,  one  for  each 
factor  of  N j .  Each  of  the  pre-weave  modules  contains  only 
additions.  Step  Three  performs  a  point  by  point  multiply 
on  the  data  array  (subroutine  MULT)  of  real  constants 
derived  from  the  small-N  DFT  algorithms.  These  constant 
multipliers  are  a  function  of  the  complex  exponentials  of 
WN  and  are  the  only  complex  multiplications  required  in 
the  algorithm.  Step  Four  implements  the  post-weave 
(WEAVE  2  subroutine)  module  which  contains  additions, 
subtractions,  and  multiplies  by  j.  Step  Five  maps  the 
u-dimensional  array  s(k^,  k£,  ...»  ku)  into  the  correct 
one-dimensional  DFT  x(k)  according  to  the  Chinese  remainder 
theorem  given  in  Eq  (3.144)  (McClellan  and  Nawab,  1979). 

Arguments .  The  WFTA  is  called  using  the  following 
arguments.  More  arguments  exist  in  this  list  than  in  the 
one  given  by  McClellan  and  Nawab  because  array  storage  is 

minimized  in  this  WFTA  version. 

264 


N  =  Transform  length  which  must  be  factorable  into 
mutually  prime  factors  from  the  sot  2,3,4,5,7,8,9,16. 

A  list  of  acceptable  sequence  lengths  is  given  in  the  left 
most  column  of  Table  3.9a,b. 

XR  and  XI  =  The  real  and  imaginary  arrays  to  be  trar.s 
formed  and  are  dimensioned  to  length  N  in  the  calling 
program. 

INIT  =  A  flag  to  specify  whether  the  call  to  FFTWIN 
requires  initialization.  INIT  =  0  means  initialization 
is  required  and  INIT  ^  0  skips  the  phase.  Initialization 
is  needed  when  calling  FFTWIN  for  the  first  time  for  a 
given  sequence  length. 

IERR  =  Contains  an  error  code  upon  return  from  FFTWIN 
If  the  DFT  was  successful  IERR  =0;  if  an  error  occurred 
IERR  =  -1  or  -2.  There  are  two  causes  for  an  error: 

(1)  The  transform  length  is  illegal,  or 

(2)  The  program  has  not  been  initialized  for 
the  correct  length  N  sequence. 

SR  and  SI  =  One  dimensional  working  arrays  of  length 
M  =  x  x  M-j  x  which  is  the  product  of  the  multi¬ 
plies  required  by  the  small-N  algorithms.  The  value  of  M 
for  any  permissible  N  is  given  in  Table  H.l  in  the  right¬ 
most  column. 

COEF  =  One-dimensional  array  length  M  used  to  store 
the  constant  coefficients  generated  by  INISHL  for  the 
"weave"  modules. 
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INDVl  and  r 2  -  One-d  imonsiona  1  length  N  mapping 
vectors  for  pro-  and  post-permutations  of  the  data. 

Usage 

(1)  Specify  the  input  sequences  XR  and  XI  with  parameters 
N,  IN  IT,  IERR,  SR,  WI. ,  COEF ,  INDX  1,  INDX  2. 

(2)  Call  WFTA  (XR,  XI,  N,  INIT,  ERR,  SR,  SI,  COEF, 

INDX  1,  INDX  2)  . 

(3)  XR  and  XI  are  the  output  real  and  imaginary  vectors. 
The  error  code  IERR=0  specifies  successful  completion 
of  the  transform. 

(4)  After  the  initial  call,  use  INITIO  as  long  as  N 


remains  constant. 


4  i 


c  8  U  = 

.-’9  N  = 

9 1)  o=»: 

3 1  IJ=C 
35  0=C 
33  fi=  1  fm 
■340=1: 

.35  0=C 
3h  0=C 


I  ‘  i  >  T  .  E  II 
f  c  .  T  =  -  : 


1 1  1 


1 1-  1  ;■! .  -  t.‘> .  ri  1 

iepp=-e 

petupn 


i l  ij  1 1 


PPOOPhM  not  initialized  fop  TH1 

N  M  I.I  L  T  =  N  D 1  ♦  N  0  3  ♦  N  P ♦  N  B  4 
PERMUTE  THE  INPUT  DATA 


A L  1.1  E  OF  N 
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L  HLL  p  e  R  M 1  1  4  R  «  \  I  f  X 
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-;4ri  =  .' 

DO  T HE  PRE  —  t  h  ‘.'E 
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1  tl.-i  j  ri 

“'L  11  =  1. 
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EN|i 
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•  N D  1  «  Nile!  *  N  l*  •  i i  U*4 
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•  lr  1 

=.i  1  (-.H  r  ♦•I",  -  t 

4  “  *  '  '  = 

HP  - 

=  t.P-l  vnPF-F 

4^  -  0  = 

T  1  = 

r  i  f  F'  1  1  V  F  * 

t  <H  » 

4-. 1 1  = 

-  P  1 

rit-M  V  •  =  -  1  r i 

f  m  6  ■*  ♦*  T  t 

4*".  4  1 1  = 

~  hr*  » 

r  i  F'  —  F  •  nf  | 

*  *  .  P  i  N  P* 

4-L,U  = 

4  i 

rip  1  ■  =  '  1 

4  ”  n  = 

*  i  = 

i  1 .  i  1  •  !  ■ 

i  »  -  i 

4“  H  = 

.  i  • 

:••*'=.  i  '  N 

r-M  '  0  •  ♦  T  1 

4-  r  i;  = 

i  ■ 

M-v-s  I-rtCl 

•  -  ‘  [  .  mp  :• 

4C-  -m  'i  = 

■  t  ■ 

-!•'  l  1  =  r  i 

4c.  Ii  li=  t  (i 

i 

b  =  N  b  h 

461  n=  *-;  ~<0 

HP6 

:•  E=N  pH  '  F.  ♦NL NPO' 

46if*  H=. -.4  H 

hph 

;  E=fi  Oh  ■"  f  ♦r*i_. 

i.iP  E  9 

4 c,  *1 1  = 

i  F  '• 

NB.NE.-*'  80 

TO  7  fni 

4r.4l.  =  ,. 


4b50=0 

48b  0=0- 

4b?  0=i" 

4880=0 

4890=0 

4700=0 

4710= 

4780= 

4730= 

4740= 

4750= 

478  0  = 

4  {  7  I  *  = 
4780= 


THE  FOLLOWING  OOt'E  IMPLEMENT  I  Hfc'  9  POINT  PPE-WEHvE 


NLuP£=l 0*N01 

NL UP 2 3=1 1 ♦NOl  ♦ <  N03-NC  ) 

NBftSF = 1 

NOFF=NTil 

00  94  0  N4=l *  NO 

00  930  N  "-:=!<  NO 

00  9 10  N8=t .  NO l 

NP  1  =NpH  SE  +-N0FF 


479h  = 

1  pNPFF 

48  0ii= 

H  P  *! = H  P*  c‘  H  t  j  F  F 

431  0  = 

HF4=HP?pNPFF 

488  0  = 

HP '^»=HP4  ♦•HOFF 

48  30  = 

HP6=H~cifNPFF 

494  0  = 

NC'7=NP6  4-NnFF 

4  1  MS 

^  -  =.ri-  F 

4  u  = 

«■  4  >  -« = f  4 -r t- r 4 r  F  F 

4  -  11  = 

,  »*-  '«  m=;  •*-  -  c 

-  -  •  - 

-  f 

4;-.  -*n  = 

1  K  =  '••  P’  *  » 1 K  '  —  r» 

’  r if-  * 

49  00s 

P  '  HF;H:  F  :«  = '  P  • 

NOh  E 

4-M  Os 

T  »  =  P  •  H  C  ,•  1  4  P' 

•  NF  8  • 

4-4  8’lls 

T  c.  ~  4  P  1  HP'  «4  '  *”  ■  F' 

NF8) 

4'4  '-Ills 

.  P*  •  T  4  P'  c  ?  =  T  c, 

4->4lls 

T  1  =  P'  •.  1  iF  1  >  4  „•  P 

-  NF  x  > 

4-- -.11  = 

\  f*  *  t  4^  i  •  -  pr 

•  r  i  k  c;  i 

4  -*r- .  IlS 

C  *  r  4r*  \  t  =  I 

4'4r  ns 

T  4  =  '  P  »  HP  4  if'P' 

.  NP5) 

499  Os 

Tf.-’P • MP4 ' -‘P 

iNPS) 

499ii  = 

,'P  •: NP 3  1  =  T  1 4-T4FT7 

5  III  ills 

-p<NP4 >  =T l -T7 

5010= 

OP <NP5>=14-T 1 

MQO'.' 
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=1 II,--  II = 

.  P‘  *  M P  r.  »  =  1  /  —  T  4 

“,11  --11  = 

P  i  MP  1  1!  '  =  1  8  +  1  s  +  r.-. 

S  114  11  = 

h‘  *  Mh  ,  i  =  l  -  1 1^1 

“ill1-,  11  = 

_  p  1  f  1 H  ,7;  i  =  i  S  -  1 

c-  1 1 1 '  = 

p  ,  MP  -  .  =  1  ,---VS 

c.ll,'ll  = 

T  .-:=  ]  >  M  P‘  •  H  I  •  M P  h. 

Sn.:.i;= 

Tr.=  1  ,  rp  J-;  1  -  1  -  hp  -■ 

Si  I*  11  = 

•  ■"  I  >:  M Kh  ’  F  1  =  ‘  I  ■  NPP  '• 

Mlll'lS 

T 7  = r  I  •  rips'  >  h  i  .  mp  - 

SI  l  H  = 

T  2  =  -;  l  •  i  ip  7  •  -  I  •  ■  iF  2 

Sic  n  = 

:  I  'll.’  r  =  1  Pi 

Si  1 1 — 

T  1  =  :  I  1  MP'  1  -Hi  *  HP 

SI  4n  = 

T  3=  '  T  '.HP!  '  -  I  ■  MP 

Sis  11  = 

i  [  (IP  l  .*  =  1  7 

S  1  H,  U  = 

T4=S1  •:  nP'4  ■  h:;:I  ■.  hp  S 

si  7  0= 

T5  =  S I  1  NP4  1  -  i  •  fi P  S 

5  1  3  n= 

'  I  ■  MP  3:  ■  =  T  1  hT4hT7 

si  S|.= 

I  , 4  1  =  r  l  -  T7 

5200= 

31  ..hPS..  =  T4-T  1 

5210= 

SI (NP6>=T7-T4 

5230= 

SI  <NP  1  O')  =T2+T5+T8 

5230= 

SI (NP7)=T3-T2 

5240= 

SI (NR 8) =TS-T8 

5  £50= 

SI <NP9> =T 2-T5 

5260=91 0 

NT'.hSE=NBRSE  +•  1 

5270=930 

N  E:  h  S  E = N  B  RSE+NL  U  R  2 

5280=940 

N E:hSE  =NE;h 3E  hNL IjR 2 

5290=700 

5300=0 

IF (NC. NE. 7>  80  TO 

i  jc; 

1.330=0 


THE  FOLLOWING  COPE  IMPLEMENTS  1  HE  ?  POINT  PRE-l.iEftVE  MO POL 


NOFF=ND  1  »Npc' 
NF;h‘E=1 
Nl.  MF'2=:-:  ♦NflFF 
[  i ;  ,  4  1 1  :  •  4  =  :  i  r  1 1 1 

[10  rill  Nt  =  1'NOFF 

H  h  l  =  m  E *  t-  *-f-r iF F 

:i '  -  1  +-■  ■-  f 

Ml.-  •-:=MR.f-  4-mOPE 
MP4=NP  ?:  +-NHFF 
MRS  =  MF4+f<nFF 
MPh.  =  mPS  hMPFF 
N P  7 = f IP  i-'  HMOF  F 
MP.-;=MP7  KiOFF 
I  1  —  .  i  [  m|  1  t  pi  MP'h. 

I"  H.  —  (*'  I  Ml*'  I  '  -  H  i  l  iH  H  » 
T  4=  ■  p  I  MP- 4  '  ;•  P  '  NP' 3  > 

T?.=  f-R  <  NP'4  >  -  ‘  P  -  NP-:) 
T 2=  P’ •  NP'ir' •  *3E  •  NP'5> 
T5='.p.NP3)  -SPcNRS) 
SR  (NR5> =  T  P -T3 


5  £5  0= 

5 8  i"i = 

58  ?  0= 
5  O'-*  u= 
*=1391.1= 
5?m'i= 
591  0= 
598  i'i= 

59  30= 
594  ii= 
=.5  =.f'  = 

-i  F-  1 1- 


55711= 

'  P  >  TiP  8  1  =  T  5  T  3  f  T 1-. 

5  n  = 

"  P  > !  i  0  1  ■=  v  =.  -  T  p. 

-  S  ^  i « = 

r*  »  tr-  '  =  1  .  —  "1  ~i 

•  :  1 .  = 

=  T  :  -  T  1 

5f-.t  Its 

: p . hk4 . =  r 1  - T4 

» » = 

r  *  .  1  -  ,  ■  =  I  4  -  1 

”iK  -;‘i  = 

T  1  =  I  1  fl4t-T,- 

“,►-.4  MS 

;  P  I  f  i  r.  M  '  r.  1  =  P  1  !  1  P  M  r  ■  T  1 

Cir,-.  H  = 

r*  1  i  r*1  1  '  =  T  1 

“t-.r-.  Il  = 

f  1  =  1  1  •  *■  1  '  ; 1  ~  1-.  • 

St-,  r  1 1= 

1  r.  —  I  '  ,  ' -r  1  •  —  1  1  ,ii»  r,  1 

55;-:  l'i  s 

14=  I  ■  Nr  4  1  +  I  1  HP  ;;  1 

>.  4  0  = 

T  .3  =  •  I  '  or-  4  '  -  I  '  1  1  ^  -  > 

S  7 1  i  1 1  s 

18  =  [  1  -,P  -  1  «•  '  ]  1  :i~-,  1 

571  fis 

T5=  l  1  r-fP'o  •  -  I  '  rii*  5  > 

57?  Hs 

-  I  1  NP'5  1  =  TF.-T 

57311= 

I  •:  NP 8 )  =  T  5  -t-T  .3  T£. 

574  0  = 

•  1  ,n~-  . slS-TE 

“l  “iltS 

,  1  •  =  1  -  1 5 

5780= 

SI <NP3>=T8— T1 

5770= 

SI <NP4> =T 1 -T4 

578it= 

SI <NR7> =T4-T8 

5790= 

T1=T  1  +-T4+-T8 

5800= 

SI  <NFhSE>=SI  (NF:hSE>  *T1 

581  0= 

S I <NP 1 >  =  T 1 

5880=71 0 

N£hSE=NPhSE+-1 

5830=740 

NBh  :  E  =NBE*S  E  ■pNLUF‘8 

5340=500 

IF-NB.NE.5'  RETURN 

THE  FOLlQI'I  I N8  CODE  IMF'LEMfcN  I iHt 


NO E  F = n  ft  t  •  N  0  ♦  m  [i 3 
NBhSE=1 

DO  510  N  t  =  t  '  Ni'iFF 

*  *  4  1  =  T  ,  p  4  '  F  fNT'r  p 
R  1  F 

.  --  4  =  •  •  *•  •  -  P 


POINT  PP£ -infi-ivt  ft  jlu.a.r 


c.u  i.m  = 

14=  •  K  •  ilf*  1  ‘  “  _  p  •  ♦ 'iN-  4  • 

s  0 1  0  = 

T  1  =  ‘  K  »  N**'  l  <  ’  P  1  NP4  » 

0  ».i  c*  0= 

T  3  =  *  P  Mr’  3  >  ¥  >  R  *.  NR  4  .» 

r.  II  :•  n  = 

T  C  =  P  *  NR  •'  “  P  *  N  P  c  * 

r.  Il4  1 1  = 

j  P  <  NP  *i  •  =  T  1  —  T 

r.  05  l‘i  = 

_  P'  •  NP‘  t  '  =  T  1  T  *. 

4.  1  In*  1  *  = 

”  p  1  i  —  ”  C'  i  N  V 4«  ‘  4 

0  u O  = 

j.p  Ht*.  5  •  =  1  4  1  4 

oU*f*0= 

SR  cUR8 • =T4 

Ol’r4 11  = 

SR ' NR 4 ' =T8 

61  0U= 

T4=S I < NR 1 ) -SI < NR4 ) 

r>  \  1  0* 

fl  =  SI  •  MR  1  >  4-Sl  *•  NP  4  > 

^  V  K*  I  | 
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I-.  1  .-11= 

1  —  I  1  f  1 K  *'.  * 

r-.  [  i » = 

T:=  f  »  .  » 

1 4 1  >  = 

.  1  1  fit*’'  >  .•  =  i  1 

-.1  '  "  = 

'  i  i h~  i  .  =  r  i 

r.  ]  r-.  H  = 

I  ■  H  P  6  ,-  >  = 

>-•  1  7 1 1= 

I  >  1 

i-.  1 .-.  i'i = 

E— 

II 

i’lj 

.1 

6— ' 

r~.  1  4|JS 

1  , MP4  - =T3 

r-»  1 1 1 » —  *i  |  1 1 

nf.H  '  f  =n  t  ri 

1  1  .  (l-  - 


CM-’  1  1.1  = 
r- I '  = 

6 I '  = 

r.,-4l'  = 
►-.350= 
r.  c  6 1 1  = 
c.c.Tn  = 

r-, -.11  = 

lie' 90= 
6900= 
631  0= 
633  0= 
6330= 
6340= 
6350= 
6360= 
637 0»C 
633  0=C 
6  3  ‘90=C 
640  0=.; 
641 0=f 
643 0=f 
6430=r 
644  M  = 
6450  = 

6. 46.  |'|  = 

6  4  7  0  = 
646  0  = 

I-.4  ->!•  = 

n  6 1 1 :  /  = 

I  m  = 


£  T  U  <»  1 1 

r  M  fl 

!  ' 6‘ r"; !  t  T  I M  —  Mi  1 1  T  '  ■»'  *  ;•  1  %  l_  l  j t-  6  *  rip! I  'I  |  * 

..  ill i* l  .'1 1 1  r  i  t  i  ri > f  1 6.  *  ri<  «  m  fi  •  h  f  1 1  «  nli,-*'  m  1 1  -• « 1 1 1 i4 

PfeHL  iK  *.  1  1  •  I  •  1  •'  «  CGcF  *  1  > 

00  in  J=  1  .  firli.H_T 

■  t  ,_l  i  —  r-  '  I  ‘  ♦  I  i]rf"  1  J  ' 

.  I  i  .1  •  =  I  •  I  1  ..’f  f  i  .1  i 

CO NT INUr 

RETURN 

END 

SUBROUTINE  MEH VE  3 1  >R«  S I  1 
PEmL  SR  <  1  >  » 3 1  <  1  > 

COMMON  Nh « NB  • NC «  ND«  ND 1 « NDc  *  Nli  .-i*  NB4 
PEML  >?  1  3  >  iT  <16) 

IF  <ND. NE . 5>  60  TO  ?00 


THE  FOLLOW  IN*  CODE  IMPLEMENT.  I  HE  5  POINT  FOS  T  -i.iERVE 


NOFF=ND 1 ♦nDc  *ND  3 
r<FH'E  =  i 

DO  510  N 1 = 1 « nQFF 
NR1=nPh'E*-N.-ff 
M  E' .-’  =  M  1  ►  I i>  •  F  F 

I  4  = .  <  l*  ■,  f  i  t  r~  r 
I.-  -.  =  <  -  4  ►  •  -  - 


c.  -.11  = 

r.54  0  = 
655 1 1  = 
65611= 
r. 5  7  0 = 
653li  = 
r-.  -  4 1 1  = 
r-.r.ims 
66  1  0  = 
663  0  = 
r>  r.  >11  = 

664  0= 
6650= 
6660= 


i  -  =  r  l  - 
T 1  =  T  1  p  P  •  n  R 
T4=  [  ■  M-  3  •  p  [  '  f<4  =.  i 
T 3  =  1  '  HP  4  '  P  '  I  '  HP  S  i 
'•R  •  HP  l  >  =  T  1  -T4 
P  4  =  T  1  P  f  4 

r  =  I  -I  p  r  .- 

r*  '  ■  *  —  --  *  =  (  -  i  r 

T 1 =  : I  • Nf.H ' F ■ 6  7  I • NR 1 > 
T 3  =  T  1  -•  I  -NP 
T  1  =  T  1  P  '  I  •  HP  • 

T  4  =  "  P ■ HP 3 ■ P  'P  • NFS) 

T 3  =  ’  R  1  NP’4  i  p  P'  •  HP  5  i 
31 <NP1 >  =  T 1 pT4 


! 


i 


f 


4 


( 


i 


6670= 

si  (NP4>  =1 1 -T4 

"  I  1  N  i  =  T  -T  8 

r- .-.  M  = 

■  1  . ,  w  •:  >  =  I  TR 

r.  ,'Mi2 

'  r‘  i  f  V' 

r.  i  1  0  = 

;-.r  i  i'lf*  4  1  =  ^4 

r-  !'»=—•  1  1* 

^  -:''M  r  *•  1 

6  r  7.(1=  7  i  t  1 1 

IF  » M !  .lie.*  i  1  c*D  ^  l) 

K  4  I  i =  i 

6  ?SO  =  f  ♦  * 

F  n=i 

[Hr  PCl'-Lu'*1  '  '  i*i  •  i'1  i'r  Ir1i-'Lt6  ■ 

h?.Sil=l 

6790=i.  *♦ 

^•-;r,n=c 

66  1  0= 

hijPr  =r-ui  1  *n['c 

l~V 1  1  — 

NBhSE=) 

NLu6c=S*ih’|i-F 

68.4  0  = 

LlO  74  0  N4  =  1  *  f'i 0 

rr-  1 1  = 

DCs  71  0  ii  =  1  .  .'i.JFP 

6860= 

NRl=fiBH.St  +-NGFF 

687  0= 

NR£=NRl*N0FF 

i-.tfy  ii = 

NR3=HR  6  +NC1FF 

6  8 '4  0= 

NR4=hR3  4-nCIFF 

69(1 0= 

NP5=NR4*N0FF 

6910= 

NR  6 = N  R  5  +•  N  0  F  F 

0= 

NR7=NR6+'NCiFF 

6930= 

NR3=NR7  +-N0FF 

K'4*4  l.ls 

T 1  =iR  (NR  1 >  +SR  <:  NBh.SE> 

6950= 

T8=T  1  -SR  (NR3>  -SR  <NP4> 

6960= 

J4=T  1  *SR  (NR 3')  -SR  (NR7> 

6970= 

Tl  =  Tl+-SR  (NR4»  6  SR  •NR7> 

6980= 

T6=SI  <«R8'»  +SI  <NR5 >  +S  I  (NR8  * 

6990= 

T5=SI  (NR 8:'  -SI  (NR5>  -SI  (NR6> 

7000= 

T  3  =  S I  1  N-  f ‘  >  *■  S I  '  N R 6  >  -  $  I  •.  N P  6  :> 

7  0 1  0= 

SR (NR  1 > =T 1 -T6 

7080= 

SR6=T 1 *T6 

70  30= 

SR8=F8-T5 

7040  = 

SR5=18+-T5 

7  08  0= 

-p  .  MP4  .  =  T  4  - T  *• 

,  1 1  f,  M  — 

i ▼  i  f  4  *■*'  ■  •  =  *  -X  T  ’> 

?  1 1  r  r* = 

T  1=  '  t  1  NR  1  '  *-  I  '  fi[.R;.  F  ■ 

H.-  M  = 

T  =  V  1  -  I  1  ;'iH  1  -  1  1  f  1C  4  1 

r  1 1  -*  >  >  = 

[  _i  =  !  1  I  <  .  -  i  1  • '  -  , 

7  1  Oii= 

T  l  =  T  1  I  1  N»  4  •  1-1  '  NP?  > 

7 1 1  0= 

T6=SR  <  NR 8 )  +  SR  <  NR5  >  ♦  S R  >  NR 8  > 

7180= 

7  *5  =  **.  p  v  (  IF  c‘  >  —  :  P  «.  Nk  S  —  £  P  •  NP  F  » 

7 1  3  0= 

T3=  SR  ( NR8>  * R  '  NF6>  -SR  (NR8  • 

7  14  0= 

SI  < NP 1 ) =T 1 hT6 

7150= 

SI (NR  6 ( =T 1  - 16 

716h= 

S  I  1  NR  8  >  =  1  8  +-T5 

71  ."  = 

-  i  i  MPS  .  =  1  - T  S 

7180= 

SI (NR4>  =  T4  f 13 

7190= 

Si  (NR 3')  =T 4 -T  3 

7800= 

SR  •  NP  8  >  =SF'8 

7810= 

SR  (NR 5>  =  SR 5 
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786  0  = 

K  *.•  N  K !•  —  ,'j .  fr  r. 

7 ,-  3  n  =  7  1  ii 

f  1  f . H  f-  =  M  Km  r  1 

7c  4  ii-  '4  ii 

f  1  h  4l  ~  4  =  i  f.6  '  F  4-NI.I.IP8 

,.\^-,h=  >:  1 1 1 1 

I  F  * . i B •  c.i-  .  1  *  I’U  TO  4i.ui 

7  rV.  0= 

IF'NB.NE.3'  30  TO  600 

7670=7 

7l£*;iiri=r  +» 

7,^*9  n=r 

7.;inn=i' 

TH<~  i-OLLO1'1 1  NO  r  ODE  IMPLEnFni-. 

1  Hi-  3  POINT  4 

I 

;  i  u-=. 

7::,-n=i.  +  ♦ 

7":  3  n=r 

7  54  0  = 

Hi  i.iP6=6  +  mM 

7  35  M= 

Hi_ijF63=  ::*i’M  *  i.  N [i 3-.ii”  ■■ 

7  if«  0= 

N  Dh.SE  =1 

7  37  0= 

NGFF=ND1 

73.-:  0  = 

DO  34  0  fi-.=  i,hD 

7  *:  }fi= 

DO  3  3  0  44=] »N" 

74  00= 

DO  3 1 0  N8= 1 . N  D 1 

74 1  0= 

NR1=N&*5E+N0FF 

7430= 

NR3=NR1 +NQFF 

7430= 

T 1  =SR  <  NBh.SE  >  +SR  < NP 1 ) 

744  0= 

SR  (NR  1  >  =T  1  -SI  (NR8> 

7450= 

SR8=T 1 4S I (NFS') 

7460= 

T 1=5 1 (NB*SE> +51 t  NR 1 > 

7470= 

SI  (NR  1 )  =T  1 +5R  «: NR8> 

7480= 

SI (NR8)=T1-5R(NR8) 

7490= 

SR < NP? y =xPv 

7500=310 

NB85E=NB8SE+1 

751 0=3 30 

N  B  6  5  E  =  N  B  H  S  E  +  NL  iJ  P  S 

7580=340 

N  B  H  *:  F = N  B  H  5  F  +  N  L 1 J  P  8 .3 

753i'i=60  0 

I F  ( NB .  NE .  9 GO  TO  40 0 

755 0=C 

756  fi=r 

757  0=i.' 

75  8  O-i* 
7  s  =4  n  = 

,  f— .  Ill'  — 

76  1  0= 

r  1 1 = 


7  if, 4  n  = 
76.5  0= 

766  0= 

767  0= 
7660= 
766  0= 
7  70n= 

:  1 1>= 
77c'  0= 
77  60= 
7740= 
7750= 
7760= 


THE  FOLLOWING  CODE  I MF’LEMEN  i  i  I  He.  9  POINT  PQST-I.iEh 


Nt_l.iP,?=l  0*HM 

rn.  i.i  -  -  =  1  1  ♦  r  i I .  i  *  i  r<{  .3  -(<• 

NOPE  =ri  fi  1 

DO  94  0  N4=1«ND 

DO  630  N3=i»Ni_ 

DO  910  N,-:=1.ND1 
NRl=NB:Mf  E+NflFF 
NPc'=NR  l  +  Mi>  F 

i  <  i»' .  =  C  +  •*  * 1  >t~  t~ 

;  *  -  4  =  1 1  ^  t-  *  * .  •  -  ♦*• 

NP5=NP4*M"iFF 
NR6=NR5  fNriFF 
NP7=NF6+NriFF 
MPft=NP7*N0FF 
NR9=f  4F"6  ♦Nt'lP  F 
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' 

77  7  n= 

HR  1  0=NF"4*NnFF 

! 

77;-:n= 

T  -:=  F  »  N  r*  r  i  - 

Hr  « 

f  » 

; '  7  ~m  ■  = 

T  i  -  ■  i*'  1  I'IKh  *-  *  4- 

'  &  ‘ 

!•♦*-  1  .» 

,c  ir»— 

.  <*■  i.  f  l  hH  E  =  j  R  ». 

r  Hr 

t  *  ¥  \ 

R  '.MR  :: 

t  u  = 

TF.  =  1  -:r  1  .*-.4  1  0 

. 

7  n  = 

F  i.  r.F  3  •  =  r  T 

»  HR 

)  II  * 

-1 

7  7.  1 1  = 

1  4—  (  7  F  F  1  i  lr*  ~ 

-  \  R 

•  ”  4 1*  r^.  .* 

7  $4li= 

T  1  =  T  7  —  '  t*  .  fi  F  4  ' 

—  9  IT 

•  HkCi  1 

i 

7  0  c,  1 » = 

T7=T7F  ■■  F  •  Nr*  4  :■ 

+■  r* 

»  H  Z  R,  .« 

1 

7*  y. 1 1  = 

■*  •  IMF  r,  •  =  1  r. 

7S7  u= 

i  3  =  ;  I  1  NR 8  .•  -  1 

‘  mR 

r  •  - :  i 

•  r  i  R‘ ) 

737.1;= 

T  ~i  =  ;  I  •.  NFc  .«  >  I 

•  HR 

-  »  —  ”  T 

1  J  «  f.  -4  » 

70*0= 

T  £  =  *  I  .  ;MR8  '  F  c.  I 

1  H  ‘~r’ 

7  .  F  *  I 

■  N- ^  , 

79m  i.i= 

F  •.  NR  3  .'  —  I  ,  —  T  £■' 

i7’-4 1  o= 

1k;-.=T7fT8 

7900= 

R  ( NR 4  ■'  =  T  1  —  T  .m 

790»*i= 

SR  •  NFS  .>  =  T  If  T8 

7  9  4.  h  = 

IF  7=1  4  -  T 

7950= 

SR8=T4fC5 

7960= 

T3=SI  (NBhSE)  - 

SI  <■ 

NR3> 

7970= 

T7=SI (NBhSE > f 

S I  ■■ 

NR  1  > 

7980= 

SI  <NB9SE *  =SI  C 

NBA 

SE  »  f.S 

I  <HP3 

7990= 

T6=T3-SP  (NR 1 0 

> 

800  0= 

S I  C  N R 3 ">  =  T3f S R 

>.NR 

1  0." 

801  0= 

T4=T7>SI (NR 5 > 

-5  I 

<NR6’i 

8080= 

T  1=T7-S  I  ( NR4  :• 

-SI 

■:  NP5  > 

8  03  0= 

T7=T7  fSI (NR 4) 

FSI 

1.NR6') 

8040= 
3050= 
806  0  = 
807  0= 


'  I  (NR'6>=T6 

T 3= SR  1  NF'c >  -SR  1  NR7  -SR  (  NR3"  1 
T5  =  *  NR  8  ■'  F.SF*  '■  Nkr  —  R  *  NR 8  ■' 
T£=SR  <  NR£  .'  F.SR  (NR7>  +SR  <NP9’> 


8 

r 

0 

0= 

SI 

(NR1 

=  T7 

HE 

3 

c 

0 

0= 

SI 

('NR  8 

=  T7 

-T8 

3 

l 

M 

0= 

•SI 

CNR  4 

1  =  T  1 

FT8 

c 

l 

1 

n= 

SI 

CNR  5 

•  =  T  1 

-T8 

3 

l 

cl 

0= 

SI 

(NR  7 

=  T  4 

K5 

3 

i 

“« 

0= 

SI 

'Nr  »*' 

=  74 

-T5 

3 

l 

4 

l'l= 

SR 

(NR  8 

=  p 

- 

i 

S 

1.1  = 

r* 

.  N  R  7 

i=i* 

3 

i 

R. 

11  = 

•  K 

.  f  -t  F 

=  •  P' 

,r; 

i 

T* 

11= -t 

•  *b 

n  F  =f 

'f-H.rtl 

-  1  i  •  -  -  ■  ■ 

8 1  -p«  li  =  *4  II 

8900=400 
88 10  = 
3?80=i* 
9830=1*  ♦ « 

884  m= i' 

■  S  n  =  r 

1 1  —  i 

38<'0  =  i.  *> 
883  n=i* 
8890= 

8  9  0  0= 
8310= 


.  f  r  -1  -  =  .  1  f\  M  r  f  ,1i  »  i 

NEFi  r  =N  Oh  F  FTIi.  IjF  8  3 
I F  (  Nh  .  E i? .  1  RE T URN 
I F  '  MO .  NF  .  4  1  60  TO  90  0 


(Hr  F0LLDMIN.9  cui't  I  MF’Lt  NF  N  I  I  HF.  4  FOlNT  FT)  T  -lilF  9  VF. 


NLUF8=4 ♦ 'Nfc-NF* 

N L I J F- 8  3  =  4  ♦  N T i ♦  '  NO? -NO 
NEmSE=1 


3if 


x  ii  = 

‘  4 1  ■  = 

85  n= 

1 1  — 

3  r  fi  = 

11  = 

- 1':= 

4. 1 1  m  = 

41  11  = 

4C'II  = 

4.  ili  = 

441. = 

4  ”i  n  = 

4  A  n  = 

4?  i'i= 

4X11  = 

4  ->ll  = 

8500= 

851  0= 

858  0= 

858  0= 

854  0=48  0 
8550=480 


on  44n  ri4=i , nii 

D  i  4  -  0  IS  •;=  I  nil 

Dt  i  4  -  u  !  -’ =  t  ,  ;i4 

Mi-- 1  =  flBG  F  4- 1 

i’  1 ,-:=,  In  1  X  1 

r  i  s = r  i  x  x  x  l 

TRli  =  ir'  i  Nf-  ri  ■  F  '  x  :■  F  1  f i F  > 

J  [•'  L~  =  r-  i  1 1  F  rl  r  '  —  x  '  *  l  x  X  * 
1X1=  —  1  f  1  ‘*1  >  X  ■  x  1  '  —  -  • 

1  F  =  x  *  r  i  x  )  "•  -  F  i  •  •  •  "• 

Til  =.'■•  I  1  (IF  1  '•  4-  ■  I  i  Nx  8  i 
T  I  8=  :•  [  1  [IX  1  1  -  1  i  !  tlx  • 

;  X  ’  fl  1m  X  I  =  T  Fl  1 1  4-  1  x  1 

x'  li  F',-'  >  =  T  x  1 1  —  T  F'  1 
SR  ,  NP 1  :■  =TF\?  4* T  I  8 
SR  ■:  H  F  8  >  =  T  P  x  -  T  T  8 
T  I  0=  '  I  >:  r-i  Fh  *  F  >  x  ~  1  >  tiF  - 
T  1 8=  i  I  1  n  Bh  F  •  —  I  •  fix 8  > 
SI  (NPhSE >  =1  I  04-T  1 1 
i I (N88>=TI 0-TI 1 
SI  >  14  F‘  1 )  = T I  £  -  T  R  3 
S I  i  HR 8  ••  =T  1 8  +-TR8 
N  B:  h  S  E = N  B  H  S  E  +■  4 
N  BhSE =HBhSE  4-HL  UF‘8 


8560=440  H  B H  S  E = H B  R S E  + H L I J  P 8 3 

8570=800  IF  c’NH.  HE .  8 >  GO  TO  1600 
858 0=0 


859  0=0 
8600=0 


86 t 0=0 
'-I  i cl  U  —  C 


THE  FOLLOW I HG  CODE  IMF'LEMEHTS  iHfc  8  FOIHT  POST-MEG' 


8630=0 

864n=0 


8650=  NUJP8=8* 1  HD8-HB  > 


Xx.x.  1.1  = 

H  L  U  P  8  3 = 8  ♦  14  D  8  ♦  •  n  1*3-  H  0 > 

867  0= 

HBhSE=1 

Xx.x  0  = 

DO  84  0  H4=l »  HD 

F.  x  fi= 

Dli  6  8  0  H  -  =  1  «  tiF 

8  7  0 1 1  = 

[if]  ,88  1 1  pi, 5=  1  *  ii  B 

-:71  0= 

t';v  1  =Hf.G  ‘  x  4-  ’ 

3  7 u  = 

r  44  -  =r,X  ]  x  ) 

r  '-,ii  = 

14-  '=  ix,-'  4-  1 

374  0= 

HP4=HP  5  4- 1 

375li  = 

HP=.=Nx4  x  1 

$780= 

HF  6=nx'5  x  1 

7  7  0  = 

HR7=HPx.x  l 

3780= 

T  l  =  x'  '  MBG  F  ■  -SR  •  HR  l  > 

x  7x  u~ 

-p , Mfx'F > =r  R • MTR 'F -  x 'P • HP] 

-;xiim  = 

p  X,  =  P  ,  tlx1 I  X  1  1  f  1 X  -*  » 

1  0  = 

•_--F  1  Mr  c'.1  -  R  >  t  ix\-'  *  —  -  i  '  1 1 F  ! 

3880= 

T4=  >R  >.HR 4 j  -SI  •.HR5> 

58811= 

T5=  3R  ‘  HR 4 )  *■  S  l  1.  HP 5  1 

83411= 

T  6=SR ‘ HP 7 > —S I  ' HR6  > 

3:8511= 

T  7  =  ’  R'  ■  H  x'  7  1  :•  I  •  H  F'  <5  1 

8860= 

SR (HR4>  =  T  1 
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E 


£  *  HP  1  1  =  T  4  f  T  k 
f*  1  4  -  I  r, 


-<  1 1 1 1  = 


rr  1  ‘  t  -  ,'l  =  1  .  4-  f  , 


-•  1  i l— 
,4.-11  ~ 
>;■-«  ||  = 
'-44;.= 


I  l  —  .  I  •  1  I  Sri  h  '  -  ;  i  *  ■  -  l  < 

i  '  n  f  m  i-  1  =  t  .  r<  (- 1-  *  -  *  4-  ■'  I  .  r  *4.'  |  1 

T '  1  1  r  1  >  •  r i , 

1  1  H*'/  I  =  1  n-r •  4- :  P  1  r  4P-  4,  , 


-4 7  n = 


‘4|||||l  = 

■4 1 1 1  |  1  = 
■4 1 1,4  11= 

■4 1 1  -  j  ( = 
■4|I4m  = 


T4-.  f  ,  HP  _j  .  '  4  .  SP-  . 

I "  ~  :  I  *  T 1  -  4  •  —  4  1  r4c  i 

I  >'  N  P  P.  >  =  1 

]>•=  -K  1  HP  !-.  •  r  ;  1  >.  fit*  7  :• 

I  .  S  ;  }T  t  i  *1*  J-  I  -  i  1  1 V1- 

;  t  >  NK4  '  =  11 

.4. 1  NP  1  )  =  T 4^T r, 

-.1  ,  HP  >  =  T4-TP. 

*  I  >  HPS  1  =  ■'  “■  4-  T  7 
i  I  .  Hf.v  =1S-1  7 


•=•  riff.  0= 

908  0= 

9  ii7  0= 
9080=95 0 
9090=990 
91 00=840 
9110=1 6 0  0 


SR <NR3> =SR3 
SR  *:  HP5  >  =SR*4 
S  R  '■  N  R  6 R  6 
N  B  8  s  £  =  H  B  8 .'  £  +■  9 
NB8  '.£=N  Fh-F  vr-tLURE 
H  B  8  S  £  =  N  B  8  S  £  +NL IJ  P  c’  3 
IF  (MH.fiE.  16>  RETURN 


9130=1: 

9130=C 


9140=C 

9 1 5 0=C  THE  FOLLDi-UNH  COPE  IMF'LEMtNl  3  1  hp  IE  POINT  P0iT-l.iE8VF 


9  t£.0=r'. 
81 7  0=C 

■4  j  9 1 1=1 ; 


91  Qfis 

NL  UP.?=  1 8  ♦  •  NT’S  -N  B  > 

fi  11  = 

ns.  UP5  9=  1 8  *N  08  •  •  Nit  3 

9?  1  ft= 

NB8SF=1 

-*rv:n= 

Lli'l  1  r.  4  1 1  144=1*  HO 

f  1  "1  1  *4.  1 1  N  •  r  1  <  hi' 

4.  n  = 

i'.  j  lrch  .  <  -  =  1  *  i  1  r 

hv  1  =  ■•<  l  c.  •-  «.  t 

4  -  — .  n 

1  *  ' 

**  ■  ,  * .  — 

;  1-  =  .  .  .  t  ; 

9r m  = 

HP’4  =  NP  4-  1 

4.-'  -4  n  = 

r4pc,=HP4  «- 1 

9 m  1 = 

HR£.=14pS  4- 1 

9':  1 11= 

H  P  7  =  1 4  K  h  4-  1 

n  = 

HP9=r 4P  7  4-  1 

•:  i :  = 

HC  4  =  'jc.  :-.  f  1 

•  4  ('■  - 

tic.  1  ie-j  4-  1 

•4  -.-,  ||  = 

9o‘fS  1  »— 

■-*  -i/  I  ls 

q  r*  n  = 

94  0  0= 


HI*  1  1  =MK  1  IH-  1 

NR  1 5  =  HR 1  1  M 
NP  1  -1  =  HP  1  4  4-  1 
NR t  4= NR  1 3 M 
HP  lc.  =  t4p  14  4-1 
NR  1  b=NR  15  4-1 


MOGUL 
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-i4  1  l.= 

r  1  = r  «>*■  1 1 

U.-  I,= 

l  '  .  *  -  ,r  1  .  ♦  f  .  i-H  r 

„  _  1 

T*  l 

J4  -.11  = 

r\ 

1. 

T 

tl 

V 

♦  rr  |  1  *■ 

■  p  i 

J-14  11  = 

1  '  4  *  —  •  I  i  t*  c  .» 

J  •  .  tf- 

44S  !'  = 

T  •  1  —  ’  f»‘  •  ,  -•  1  — 

I  » 

-  1 

11  = 

7  t  S  1  =  p  »  r <  K- 4  .»  4- 

}  < 

-  * 

->4  .’M  = 

1  r  lr'  lH}r  4  1  - 

!  ‘Mv 

- .  1 

ii  i  - 

1  1  1  -  —  1  *  "  t !-»  f— .  t 

- 

-4  -l-.r- 

}  *  ,  ♦  —  —  ]  :  i.t  h  • 

¥  1  r« 

5E.  li  |l- 

X  •  *<  1  -  •  P  1  riP .*  4* 

-  p  .  r-<P 

1  4  > 

4-,  t  11  = 

T  1  ]  F.  1  =  **  <  NP  • 

-  P  .  j  4 

-  1  4 

->S  -  |  1  = 

T  •  i  :  '  riKi 

r. »  -  .  1 

.  :  ,  : 

4-.  |  ,= 

T  1  1  1  1  =  '.  1  1  NR'  1  1: 

:  I  * 

1  ->6  1 

■4S4  1':  = 

T  1  6  ■>  =  ■  :•  6  '  H  4  1  5 

—  7-  P  1 

NR  1 

'4SS  i  i  = 

T  .  1  ,  =  -  6  .  nr  1  1 

1  — 7  p  r 

■4R  1 

'4-'-.  ii  = 

t  ■  1  1 1 1  =  -  ;  t  -4-4 

*  —  1  1 

-  1 

4"'. ?  11  = 

r  >:  14.1  =  -  ,  l  .  nR  1 

6>  +  5  I 

>.  NR 

958  0= 

SR  'i  HR  9 )  =  T  •:  5  •'  *T  '•  7  > 

959  n= 

S P6=T  (5  ;.  -  T  •:  7  > 

96  0  0= 

SRI  0=T  1  6  1  4-T  (8 

j 

961  0= 

SR  <HR 14 > =T (6> 

-T  >  8) 

9620= 

0  (?)  =  T  >:'  9  4-T  <  1  0  > 

9630= 

Q  >33  ')  =T  < 9  >  -T  *  1 

0  :• 

964  0= 

Q  ■:  1  :■  =T  >:'  1  1  :>  4-T  ■: 

19> 

9650= 

•  1.5  >.  9 )  =T  (11 >  —  T  ( 

19) 

9660= 

f.i  >:  4  :•  =T  ( 1 4)  4-T  >: 

15) 

9670= 

Q  <5>  =  T  >:  15>  -T  • 

144 

969  0  = 

i'i  >'  -::>  =T  >:  1  1  4-T  >: 

16) 

9690= 

i'i  6  :•  =  T  >:  1  3 )  -T  •: 

16) 

97  0  0= 

SR  1  NR  1 )  =i;>  >.  3  :>  4-0  >.  7  * 

9 PI  0= 

SR  >•  NR7 )  =,:i  •  7  ■>  - 

I'1  •:  S:  :• 

9, 7  8  0= 

•••  R  9  =  ':*  1  8  1  4- 14 ' 6  > 

9730= 

SR  <  NR  1 5  1  -9  >  3 ) 

— 1?  1.  6  ) 

974m  = 

S R5=r.i  1  1  >  4-i'.i  >  4  > 

975  0= 

SR 3= 1*1  •:  4  •  -id  1 1  ) 

97611= 

>1  3  =  l"'  1  1  4-l'.*  >.  5 

- 

4  ;•  7  fi  = 

4  11  =;■>  ■  6  ■  ■  - 

-• 

'4  ||  = 

K  '  H  4' 1  =  1  *  l  ‘ 

^  r  -mi— 

•  R  .  - 1 :  -1  .  =  r  ■  •  ■ 

t-  » 


lii-  \  ■ 

989 0=  _  [  •  f  i  hH  _  e  1  =  ’  I  1  Hr-  1  i  «-  ...  ;  i  H f, m  ’■  t  > 


,:*\z  :•  it  = 

T  •  4  > 

=  1 

1 

NR.-'  ■  -  '  R  1  NP  3  ' 

-*w4  n= 

T  •  ri  :• 

=  :  l 

1 

f  ir-  c'  -•  4-  P  1  NR' 

Sii  = 

r  >.  6  • 

=  1 

( 

1  i 6  — i  •  —  ;  R  •  fiR' 5  1 

k  n  = 

r  >  5 ) 

=  ‘  1 

t 

N R  4  >  4-  ~  R  >  f  iR  5  1 

r  0  = 

r  >  -:  > 

_  -  r 

1 

lit  (-  i  —  '  J  1  T'k-  7  1 

■-»  1  >  - 

T  •  7  • 

=  *  c 

• 

•  4-'  !  .  -.=  ?> 

•4S  4  11  = 

I  •  r*  1 

=  :  1 

1. 

[  1  r-  j  1  Hr-'  1  4  .1 

•4  41111  = 

T  1  1  c 

.1  =  ' 

I 

•  14  R  ;-!  >  —  >  1  iflK  14) 

•49 1  0= 

T  >  1  - 

>  =  " 

h 

•  1 4  R  t  0  >  4- '  R  >  NP  1  9  > 

9*4^0= 

T>11 

>  =  s 

R 

>  NP19)  -SP  *  NR'  1  0) 

^  -4  5  n= 

T  >  1  6 

1  =  ■ 

I 

'  NR  1  5  >  -  :  1  1  NR  1  7  > 

9441*1= 

T  •:  1  2 

1  =  s 

I 

>  NR  1  1  •  -S  I  •  NR  1  7  ) 

995  fi= 

T  •  10'  = 

R 

C  HR  -51  V  ‘  R  (HP  1  6  ) 
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Appendix  I.  Computing  the  Prime 
Factor  Algorithm  (PFA) 

This  program  computes  the  DFT  defined  by: 

N-l 

X(k)  =  Z  x(n)  exp(+j2iTnk/N)  ;  k=0,l,  .  ..,  N-l 
n=0 

where  the  sequence  length  N  is  a  product  of  the  relative 
prime  factors  from  the  set  (2, 3, 4 , 5 , 7, 8 ,9  and  16).  This 
algorithm  was  proposed  by  Kolba  and  Parks  in  1977  and  was 
modified  to  the  program  presented  here  in  1980  by  Burrus 
and  Eschenbacher . 

Arguments .  The  PFA  is  called  using  the  following 
arguments . 

N  =  The  transform  length  which  must  be  factored  into 
mutually  prime  factors  from  the  set  2, 3, 4, 5, 7, 8  and  16. 

A  list  of  acceptable  sequence  lengths  is  given  in 
Table  3.11  -  a,b. 

X  and  Y  =  The  real  and  imaginary  data  arrays  containing 
the  sequence  to  be  transformed.  These  arrays  are  dimensioned 
to  length  N. 

NI  =  The  array  containing  the  factors  of  N.  If  all 
four  factors  are  not  used  the  unused  factors  are  set  equal 
to  1.  For  example  with  N=30,  we  have  NI(1)=5,  NI(2)=3, 
NI(3)=2,  and  NI(4)=1.  The  factors  of  one  must  be  the  last 

of  the  M's. 

M  =  The  number  of  nonunity  factors.  For  N=30,  M=3. 
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UNSC  =  An  output  indexing  constant  which  must  be 
precomputed.  UNSC  =  N/(NI(1)  +  ...  +  NI(M)). 

A  and  B  =  Data  arrays  of  length  N  which  contain  the 
results  of  the  DFT.  The  real  part  is  in  A  and  the 
imaginary  part  is  in  B. 

Usage.  To  compute  the  forward  single-variate  DFT: 

(1)  Dimension  X,  Y,  a,  and  B  to  length  N. 

(2)  Define  N,  M,  and  NI(4). 

(3)  Compute  UNSC. 

(4)  Input  the  sequence  to  be  transformed  in  x  and  y. 

(5)  Call  PFA  (X,Y,A,B,NfM,NI,UNSC) . 

(6)  The  Fourier  transform  results  are  located  in  A  and  B 


287 


*4  t. 


I 


1 c.l'l= 

:  I  ■  7-  *  ,  - 

1 I.= 

T  4  = :  74  *  •  ,v  -  -7  1 

1 .  Il= 

714  =  ..  74*  |  -  -  1 

•  1  = 

7  1  =  T  1  *  T 1.  t  - 

8  -:=  r  1  -l  ,--T4 

’  r  i'i  i  ■  - 

7  =.  =  r  1  -  t  -  i- r  4 

1  71  n  = 

.  1  =:  i  j  *:■,-■*:  i-; 

!  77  i  = 

•  =:  1  ]  13  4 

i  .  * 1  - 

'-=:.>  l  -4 

!  74  11  = 

: : ;  —  7  7  S  *  i  , ;  4-  4 

’  7=  1  :  — 

r  1  =7  7=  *  ,  *74 

’  77. 11  = 

T3=f  77*  .  73*7;-. 

!  "70  = 

:  l6=c  •  •  7*  7 

1  76  li  = 

T  -:=:  77*i  84  *7  7 

1  7 '4  0= 

!  I7=r  77*  (3  4  *  ; 

;  7  r  1 1  = 

T 4=:'  76*  1  3  4-73 

17:60= 

86=  U  *T6*T3 

1 63  0= 

84=T 1 -T8-T4 

1840= 

,97  =  T  1  -  f  3  *  T 4 

1850= 

S6=U1  *.M6  *U3 

1  O  ■*  1 1  = 

S4=!j  1  -M8  -:j4 

1  87  0= 

S6=U1  -83 *34 

17:7:0= 

X  <  I  <8  :•  >  =8 1  *S8 

17:90= 

X  < I  (  ?) > =81 -38 

1900= 

Y  ( I  <£')  •  =3 1-88 

191  0= 

V  •'  I  (?)  )  =S  1  *88 

1  980= 

X  <  I  >:'3>  >  =83  *84 

1 9  ?  0 = 

X  (  I  •  6.1  .  =83-7  4 

194  0= 

V  < I  t.  3  •  >  =83  -94 

1  '451:1  = 

V  I  >:  6  '•  )  =33*84 

t  96  0  = 

X  '■  T  -  4  '  '  =85-  "7, 

1970= 

X  ■:  I  '.5  '  ■'  =85*38 

107:0  = 

V  <  r  '"4  >  =75  *97 

]  990= 

V  ( I  •:  5  >  )  =3  5  -96 

3  0  0  0  =  :~n  TH  3  0 

r'l  1  f'  =  !* 

7  i'i  i  =:‘ 

i  .4  1 1-  1  c:-.  3  1 

n I 1  =  1 

£  n  7  i'i  =  36 

c’  in-  n=  9'-. 

I  I  '4 1 1  =  3‘4 

c  1  i'i  0  =  r 

l-'11h=  7  4 

.-l-i'  = 

-  1  11=  v.- 

1 4 1 1  =  '  = 

8t50  =  *7 

c' 16  U=  P7 


291 


~ .  1 

1 

■  1  .  ■  *; 

1  r 

,  c,  ,  , 

~ 

T 

.  1  •  ,  _ 

=  V  • 

I 

•  iii*, 

r 

=  V  • 

r 

(  1  '  ■  -V 

1 

iC.'l  • 

« 

1 

•  3  •  '  *;: 

1 

*  ■  .• 

=v » 

1 

1  »  1  — 1>: 

7 

• 1  1 

=v  1 

1 

*  c' .»  1  +•  V 

j 

* »  • 

—  V  •. 

r 

'  r  '  -*  -  V 

1 

1, *  .1 

—  •  1 

1 

.  -.1  1  t-. 

1  1  1  — 

f 

I 

i,ii 

=  V  » 

1 

»  •  •  ¥  v 

7 

*  7  .*  • 

= y  %' 

1 

»" ' *i  "1  -V 

7 

•  7  ‘‘  > 

=>;  < 

r 

• :.4>  >  K, 

I 

i  if*  >  ) 

Append  1 ::  .J . 


'!'  Lnin';  To  sis  on  trhe  CDC  Cyber  7  4 

The  timing  tents  on  the  CDC  Cyber  74  used  the  FORTRAN 

command  SECOND (CP)  which,  according  to  the  FORTRAN  IV 
reference  manual,  returns  time  accurate  to  "two  decimal 
places",  i.e.,  0.01  seconds.  The  results  of  timing  the 
various  DFT  algorithms  showed  this  clock  was  accurate  to 
three  decimal  places  (0.001  seconds)  giving  a  time  resolu¬ 
tion  of  0.002  seconds.  Using  three  decimal  places  was 
justified  since  almost  every  standard  deviation  was  less  than 
or  equal  to  0.002  seconds. 

To  verify  the  premise  that  counting  the  real  operations 
performed  in  a  DFT  is  the  primary  factor  determining  execu¬ 
tion  speed  of  the  algorithm  on  a  computer,  the  DFT  execution 
times  were  measured  on  the  CDC  Cyber  74.  The  execution  speeds 
for  the  WFTA,  PFA,  and  the  mixed/fixed  radix  FFTs  were  com¬ 
pared  to  the  "predicted"  execution  speed  of  the  algorithm. 

To  perform  these  comparisons  the  multiply  and  add  speeds 
were  determined  for  the  Cyber  74  computer. 

The  execution  times  of  the  floating  point  multiply  and 
add  instructions  are  given  in  the  CDC  6000  Series  Computer 
Systems  Reference  Manual.  The  execution  times  for  several 
instructions  are  listed  below  and  include  preparing  the  next 
instruction  for  execution: 
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Instruct  inn 

Assembly 

Language 

Minor 

Cvclos 

Floating  sum 

FX  . 

1 

4 

Floating  product 

FXi 

10 

Normalize  result 

NX^ 

4 

Fetch/store 

SAi 

3 

where  one  minor  cycle  equals  0.1  microsecond  (us).  Simply 
using  an  add  time  of  4*0. lps  and  a  multiply  time  of 
10*0. lus  =  lus  is  not  sufficient  because  the  operands  must 
be  fetched  and  stored  which  adds  more  time.  To  determine 
the  commands  executed  by  the  computer  for  adds  and  multiplies 
the  assembly  (COMPASS)  language  was  studied  and  timed  for 
three  cases.  First,  the  DO  loop  with  no  operations  was 
executed  100,000  times: 

DO  102  J  =  1 , N 
102  CONTINUE 

The  associated  COMPASS  language  code  was  listed  as  an 
output  of  the  program: 


BSS 

OB 

SBO 

B2  +  7B 

SA5 

J 

SA4 

N 

SX7 

X5  +  IB 

IXO 

X4  -  X7 

SA7 

A5 

PL 

X5,  (AA 
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This  loop  required  nn  average  of  2.  70;::;  (standard 
deviation  0.03us)  to  execute.  Next  the  addition 
instruction  was  executed  100,000  times  using  the  FORTRAN 
code : 

DO  102  J  =  1 , N 
102  TAD  =  A  +  B 

The  associated  COMPASS  code  for  the  addition  loop  is: 


BSS 

OB 

SBO 

B2  +  7B 

SA5 

A 

SA4 

B 

SA3 

J 

SA2 

N 

FXO 

X4  +  X5 

NX7 

BO,  XO 

SX6 

X3  +  IB 

1X5 

X2  -  X6 

SA6 

A3 

SA7 

TAD 

PL 

X5,  (AA 

This  add  loop  required  an  average  of  3.34ps  (standard 
deviation  0.3us)  to  execute.  Notice  the  "extra"  instructions 
of  the  add  loop  versus  the  no  operation  loop: 
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Cort'mn  tv! 


M i  nor  f'  ■<"  !  or 


SA3 

0 

SA4 

B 

3 

FXO 

X4  +  X5 

3 

NX7 

BO,  XO 

4 

SA7 

TAD 

_3 

17 

Finally  the 

multiply  loop  was 

executed  100 

The  FORTRAN 

code  is: 

DO  102  J  = 

1,  N 

102 

TAD  =  A*B 

and  the  corresponding  COMPASS  code  loop  is: 


BSS 

OB 

SBO 

B2  +  7B 

SA5 

B 

SA4 

A 

SA3 

J 

SA2 

N 

FX7 

X4*X5 

SX6 

X3  +  IB 

IXO 

X2  -  X6 

SA6 

A3 

SA7 

TAD 

PL 

X5 ,  )AA 

The  multiply  loop  averaged  3.37ys  (standard  deviation  0.03) 
to  execute.  The  extra  instructions  required  for  the  multiply 
loop  relative  to  the  no  operation  loop  are: 
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Command 


M inorCyc 1 c s 


SAj 

ii 

3 

SA4 

A 

3 

FX7 

X4  *  X5 

10 

SA7 

TAD 

3 

19 

Comparing  the  measured  execution  times  of  the  three 
loops  shows  the  add  loop  is  0.64ys  longer.  Based  on  the 
minor  cycle  times  for  the  extra  add  and  multiply  commands, 
the  add  loop  should  be  17*0.1ys  longer  and  the  multiply 
loop  should  be  19*0. lys  =  1.9ys  longer  than  the  "no  operation" 
loop.  (Notice  that  every  floating  point  addition  must  be 
"normalized''  by  the  command  NX7  which  requires  4  minor 
cycles.  The  floating  point  sum  does  not  require  normalization). 

The  difference  in  measured  add  and  multiply  speed  (0.64ys 
and  0.67ys)  versus  the  predicted  add  and  multiply  speed 
(1 . 7ys  and  1.9ys)  is  a  result  of  the  very  short  loops 
fitting  inside  the  Cyber's  "instruction/execution  stack" 
which  is  a  12  word  stack  with  60  bits  per  word.  Since  the 
entire  loop  could  fit  in  the  stack  the  instructions  were 
fetched  only  once  instead  of  100,000  times,  whereas  "all 
execution  times  (minor  cycles)  listed  include  readying  the 
next  instruction  for  execution".  During  normal  DFT 
algorithm  execution  of  all  of  the  instructions  must  be 
fetched  which  means  the  add  speed  is  1.7ys  and  the  multiply 
speed  is  1.9ys.  These  numbers  were  then  used  to  predict 
execution  speed  of  the  DFT  algorithms. 
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